Sei sulla pagina 1di 228

On the Meaning of Customer Satisfaction

A Study in the Context of Retail Banking

Maarten Terpstra

Printed by: Offsetdrukkerij Ridderprint B.V., Ridderkerk ISBN/EAN: 978-90-5335-171-0 Copyright: Maarten Terpstra

On the Meaning of Customer Satisfaction


A Study in the Context of Retail Banking

Proefschrift

ter verkrijging van de graad van doctor aan de Universiteit van Tilburg, op gezag van de rector magnificus, prof. dr. F.A. van der Duyn Schouten, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de aula van de Universiteit op vrijdag 14 november 2008 om 14.15 uur door

Maarten Jan Terpstra geboren op 24 augustus 1969 te Boxmeer

Promotores:

Prof. dr. A.A.A. Kuijlen Prof. dr. K. Sijtsma

Preface

There exists confusion about the meaning of psychological properties. This is because a psychological property is not a thing within a person, but an organisational principle with respect to behaviour of persons. This may sound odd, but it means that a psychological property is a theoretical concept which we use to interpret and describe behaviour of persons. The latter years I studied the meaning of the psychological property satisfaction. The results of the study are reported in this thesis. First of all, I want to express my gratitude to my promotores Ton Kuijlen and Klaas Sijtsma. They taught me how to do scientific research, they helped and inspired me, and I have enjoyed our cooperation. I am also grateful to ING for facilitating my study, and to many colleagues from ING for their support and interest in my results. Furthermore, I thank Tom Breur for his support and his feedback on the many drafts he read. Most of all, I thank Monique for her confidence in me finishing my study and for her unconditional support throughout the years.

Contents

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 References

Introduction Measurement of psychological constructs The theoretical meaning of customer satisfaction Deductive design for test development and construct validation Method of the first empirical study into customer satisfaction with BANK Results of the first empirical study into customer satisfaction with BANK Method of the second empirical study into customer satisfaction with BANK Results of the second empirical study into customer satisfaction with BANK General discussion

1 11 33 65 81 97 151 159 183 191 205 211

Samenvatting (Summary in Dutch) Appendices

Chapter 1 Introduction

Introduction

Satisfaction is an important concept in societal contexts, business contexts, and academic contexts. This is evidenced by the vast amount of studies that were conducted with respect to satisfaction in various contexts. Ironically, satisfaction seems to be a somewhat elusive phenomenon. It is as Oliver (1997, p. 13) noted: Everyone knows what satisfaction is, until asked to give a definition. Then it seems, nobody knows. This warrants further research into the meaning of satisfaction. The subject of this thesis is the unravelling of the meaning of customer satisfaction in the context of retail banking. The phrase meaning of customer satisfaction has multiple connotations. In this thesis, it refers to (a) the linguistic use of the term customer satisfaction, (b) the theoretical framework of customer satisfaction, (c) the empirical indicators of customer satisfaction, and (d) the importance of customer satisfaction in the domain of retail banking. The thesis includes a theoretical study of customer satisfaction and an empirical study into customer satisfaction with a major Dutch retail bank.

A typology of satisfaction studies

Satisfaction was studied in various settings and at various levels of aggregation (e.g., Oliver, 1997, pp. 15-17). This is reflected by the use of different terms, such as job satisfaction, life satisfaction, consumer satisfaction, customer satisfaction, transaction-specific satisfaction, attribute satisfaction, service satisfaction, summary satisfaction, and aggregated satisfaction. The types of satisfaction are mutually related by what Wittgenstein (1953) labeled family resemblances, meaning that they are mutually related in diverse ways. For example, consumer satisfaction and customer satisfaction are closely related since both pertain to the satisfaction response to consumption-related experiences, and these two terms were used more or less interchangeably in the marketing literature (e.g., Giese & Cote, 2000). However, because customer satisfaction is only appropriate for satisfaction in commercial contexts and consumer satisfaction may also be used for satisfaction in other contexts, the domain of the consumer satisfaction is larger than the domain of customer satisfaction.
1

There are also differences within each type of satisfaction with respect to the characteristics of the satisfaction response. For example, the consumer satisfaction response to dinner in a restaurant differs from the consumer satisfaction response to dental treatment. Whereas the former satisfaction response may encompass a feeling of pleasure, the latter satisfaction response may encompass a feeling of relief. Furthermore, a consumer satisfaction response may reflect anhedonic cognitions (Oliver, 1997, p. 318), meaning that it reflects cognitions that are not emotionally processed. An example is the consumer satisfaction response to using a pencil. It is useful to examine the difference between two types of satisfaction studies, which are (a) studies that are conducted at the individual person level, and (b) studies that are conducted at higher levels of aggregation. The first type of satisfaction studies is characterised by analyses of person data. These are, for example, studies of satisfaction of persons with single encounters with a phenomenon (i.e., transaction-specific satisfaction; Oliver 1997, p. 15), or studies of satisfaction of persons with the accumulation of encounters with a phenomenon (i.e., summary satisfaction; Oliver, 1997, p. 15). The second type of satisfaction studies is conducted at higher levels of aggregation, such as a firm, an industry, or a society (Oliver, 1997, p. 15). These studies are characterised by the analysis of satisfaction data that are aggregated at the level of firms, industries, or societies. For example, several theorists (e.g., Anderson, Fornell, & Lehmann, 1994; Anderson, Fornell, & Mazvancheryl, 2004; Anderson & Mittal, 2000; Gruca & Rego, 2005) used satisfaction data at the firm level to study the connections between satisfaction and economic performance of firms. Thus, there are different types of satisfaction and different types of satisfaction studies. The present satisfaction study is conducted at the individual person level, and is limited to persons summary satisfaction with a company of which they are customers. We refer to this kind of satisfaction as customer satisfaction (see also Chapter 3).

Satisfaction research in the marketing domain

Satisfaction is an important concept in marketing theory. Consequently, there is a vast amount of studies into satisfaction in the marketing literature (e.g., Giese & Cote, 2000; Oliver, 1997; Yi, 1990). Most studies dealt with satisfaction of consumers or customers with products or services or companies providing products or services. In these studies, satisfaction is often

labeled consumer satisfaction or customer satisfaction, and is often measured by means of a psychological test that is administered in survey research (Section 4). Marketing theorists generally agree that satisfaction is a response to consumptionrelated experiences (e.g., Anderson, Fornell & Lehmann, 1994; Giese & Cote, 2000; Oliver, 1997; Tse & Wilton, 1988; Yi, 1990). Still, there exist a variety of definitions and measures of satisfaction in academic marketing research (e.g., Giese & Cote, 2000; Peterson & Wilson, 1992). Furthermore, the term satisfaction is sometimes applied to antecedents and sometimes to consequences of satisfaction (Oliver, 1997, p. 15). The measurements of these antecedents and consequences are sometimes used as proxies for satisfaction. Examples of concepts used as proxies for satisfaction are quality perceptions, recommendation intentions, loyalty, behaviour, and profits. Although these concepts may serve the purpose of specific studies, they do not coincide with satisfaction. The use of these concepts as proxies for satisfaction has contributed to the confusion about the meaning of satisfaction (Oliver, 1997, pp. 15-17). On the basis of a review of the literature, Giese and Cote (2000) demonstrated a number of deficiencies in the definition and measurement of satisfaction in studies that were conducted in the last three decades. These deficiencies pertain to (a) the explication of the definition of satisfaction, (b) the justification of the definition of satisfaction, and (c) the justification of the measurement of satisfaction. The deficiencies hampered the development and validation of satisfaction theory (e.g., Giese & Cote, 2000; Yi, 1990). Giese and Cote (2000) argued that, as there exist multiple definitions of satisfaction, a researcher must explicitly define satisfaction and justify the definition selected. Because it is impossible to develop a universal definition of satisfaction, which is caused by the complexity and the context-specific nature of satisfaction, they recommended the development of context-specific definitions of satisfaction. This stance implies that measures of satisfaction should also be context-specific, because the measure should match the definition of satisfaction. Giese and Cote (2000) proposed a framework to guide researchers in developing a context-specific definition and a corresponding measurement procedure for satisfaction. The meaning of satisfaction thus is context-dependent. There are similarities and differences in the meaning of satisfaction in different domains. Satisfaction with a retail bank has both similarities and differences with satisfaction with dinner in a restaurant, satisfaction with consumption of non-durable consumer goods, and satisfaction with consumption of durable consumer goods. All pertain to the fulfilment response (Oliver, 1997, p. 13), but the characteristics of the satisfaction response and the nomological network (Cronbach & Meehl,

1955) of satisfaction differ between these domains. These differences warrant the development of context-specific definitions and corresponding measurement procedures for satisfaction, as proposed by Giese and Cote (2000). Therefore, the first objective of this study was to explore the theoretical meaning of customer satisfaction in the context of retail banking, and to develop a context-specific definition and measurement procedure for customer satisfaction.

Measurement of satisfaction

Satisfaction is a psychological property. Psychological properties are mostly conceived of as theoretical constructions, which are labeled psychological constructs (e.g., Lord & Novick, 1968, p. 352; Nunally, 1978, p. 96) and which may be measured by means of psychological tests and psychological questionnaires (e.g., Molenaar, 1995; Oosterveld, 1996, Schouwstra, 2000). Psychological tests and psychological questionnaires are instruments (e.g., well-chosen sets of items that are administered in a survey) that are assumed to elicit behaviour (e.g., the responses of a person to the items administered in the survey) that is representative of the property of interest. The position of the person on the property is inferred from the response behaviour of the person (e.g., Molenaar, 1995). In the psychometric literature, the phrase test is often used when maximum performance is measured (e.g., as with educational testing and intelligence testing) and the phrase questionnaire when typical behaviour is measured (e.g., as with personality traits and attitudes). Because test has gained a wider use in psychological measurement (e.g., Cronbach, 1971, p. 443; Murphy & Davidshofer, 1991, p. 8; Schouwstra, 2000, pp. 56-77) we prefer to use it also in this thesis for measurement instruments for typical behaviour. Validity of measurement is a key success factor in satisfaction research and in marketing research in general. This is broadly acknowledged since the influential papers of Jacoby (1976), Churchill (1979), and Peter (1981). First, academic studies in this domain increasingly discuss the convergent, divergent, and nomological validity of measurements of the constructs of interest. This is in accordance with suggestions by Cronbach and Meehl (1955), Campbell and Fiske (1959), Churchill (1979), and Peter (1981). Second, measurements of psychological constructs in academic marketing research are generally based upon multiple-item instruments. This is in accordance with psychometric theory, which postulates that single items often yield inadequate measurements of constructs (e.g., Messick, 1989, pp. 14, 35).

The interest in validity of measurement by no means implies that the issues with regard to validity are resolved. A review of the marketing literature demonstrates a serious problem regarding the definition and measurement of psychological constructs such as satisfaction (e.g., Giese & Cote, 2000; Hausknecht, 1990; Peterson & Wilson, 1992; Yi, 1990). For example, Verhoef (2001, p. 129) noticed that attribute-based measures of satisfaction differ from affective measures of satisfaction, and that the latter measures of satisfaction have strong resemblance with measures of affective commitment. Thus, different studies use different labels for the same construct or use the same label for different constructs, and such conceptual ambiguities slow down scientific progress. The practice of validation of measurements of psychological constructs often is not consistent with theory of validity, and has been criticised by validity theorists. This criticism includes the practice of validation research in satisfaction studies. The assessment of convergent, divergent, and nomological validity (Campbell & Fiske, 1959; Churchill, 1979; Cronbach & Meehl, 1955) does not cover the major threats to construct validity, which are construct underrepresentation and irrelevant variance (e.g., Messick, 1989, 1995; Schouwstra, 2000). Cronbach (1989) characterised most applications of the multitrait-multimethod design (Campbell & Fiske, 1959) as mindless and mechanical, involving the collection of facts with little concern for their usefulness for construct validation. Borsboom, Mellenbergh, and Van Heerden (2004) criticised the practice of assessing nomological validity, and proposed to assess validity on the basis of the test of a causal theory regarding the relation between the property of interest and response behaviour. Validity theorists (e.g., Anastasi, 1988; Borsboom et al., 2004; Messick, 1989; Schouwstra, 2000) agree that construct validation has to start at the outset of test development. This implies that the methodology of validation research should incorporate a methodology of test development. The second objective of the present study is the selection of a methodology for the development of a test for customer satisfaction and the validation of test scores that is in line with validity theory.

Importance of satisfaction

Customer satisfaction is expected to influence customer behaviour, customer profitability, and company profitability (e.g., Anderson & Mittal, 2000; Fornell, 1992; Oliver, 1997). Therefore, customer satisfaction is considered of strategic importance for companies in many retail markets, including the Dutch market for retail banking (e.g., Goedee, Reijnders, & Van

Thiel, 2008). During the present study, the Dutch market for retail banking was a mature and competitive market. Most of the market was divided between six large retail banks. They all offered a broad range of financial products, including current accounts, saving accounts, credit cards, loans, mortgages, mutual funds, and insurances. A number of these products was also offered by insurance companies and various niche players. Virtually each Dutch adult owned at least a current account and most owned a variety of financial products. Most of them had products from different financial companies. Fornell (1992) argued that customer satisfaction is a key success factor for companies that operate in mature and competitive markets. In these markets, company growth is accomplished at the expense of competing firms, and retention of customers is of major importance for companies in these markets (Fornell, 1992; Reichheld & Sasser, 1990). Customer satisfaction is considered a key success factor for these companies, because it is expected to affect retention of customers and to provide a defence against offensive strategies by its competitors (Fornell & Wernerfelt, 1987, 1988). Longitudinal studies (e.g. Anderson, Fornell, & Lehmann, 1994; Anderson & Mittal, 2000; Gruca & Rego, 2005) demonstrated a relation between customer satisfaction and future financial results of companies. The results of these studies strengthen the expectation that customer satisfaction influences customer profitability. If customer satisfaction influences customer profitability, there must be a relation between customer satisfaction at time t = 0 and customer profitability at time t > 0. However, longitudinal studies conducted at the person level and exploring the relation between customer satisfaction and future customer profitability, are rare in the marketing literature. Therefore, the third objective of this study is to explore the latter relation on the basis of longitudinal data.

Research goal

Deficiencies in the definition and measurement of satisfaction have hampered the development and validation of satisfaction theory (e.g., Giese & Cote, 2000; Peterson & Wilson, 1992; Yi, 1990). The usefulness of satisfaction research for the development of satisfaction theory may be increased by the resolution of these deficiencies. Because psychometrics is concerned with the measurement of psychological constructs such as satisfaction, psychometric methods may serve to overcome these deficiencies. This thesis aims at contributing to the improvement of the methodology of satisfaction research by the use of psychometric methods for the definition and measurement of customer satisfaction.

Furthermore, the thesis aims at contributing to the development and validation of satisfaction theory by means of a study into the meaning of customer satisfaction in the context of retail banking. In order to meet the research goal, the thesis addresses four research questions: 1. 2. 3. 4. What is a suitable methodology for test development and construct validation in the domain of satisfaction research? What is the theoretical meaning of customer satisfaction in the context of retail banking? What is the empirical meaning of customer satisfaction in the context of retail banking? What is the importance of customer satisfaction in the context of retail banking?

Contents of the thesis

This thesis encompasses three components, which are (a) a theoretical study into the measurement of psychological constructs and the validity of measurement, (b) a theoretical study into the meaning of customer satisfaction and customer dissatisfaction, and (c) two empirical studies into customer satisfaction with a major Dutch retail bank. The empirical studies were based on survey research that was conducted among customers of the bank. Chapter 2 addresses the measurement of psychological constructs. The chapter starts with an introduction into the conception of psychological constructs, the different approaches to test development, and the measurement process. Subsequently, the theory of validity of measurement is discussed. The chapter ends with the choice of the appropriate methodology for test development and construct validation for this study. Chapter 3 discusses the theoretical meaning of customer satisfaction. The chapter starts with an exploration of the theory on customer satisfaction and customer dissatisfaction, and the conceptions of satisfaction and dissatisfaction in these theories. Subsequently, the nomological network of customer satisfaction is explored. On the basis of these explorations, a definition of customer satisfaction in the domain of retail banking is provided. Chapter 4 discusses the deductive design (Schouwstra, 2000). The chapter starts with an explication of the deductive design, which is a methodology for test development and construct validation for personality traits and attitude-like properties. Subsequently, the theory of violators (Oort, 1996), the purpose of the empirical study, the development of the test for

customer satisfaction with a retail bank, the outline of the measurement model, and the hypotheses regarding the validity of measurement of customer satisfaction are addressed. The purpose of the first empirical study was to measure customer satisfaction with a retail bank, to investigate the validity of the measurement of customer satisfaction, and to explore the relation between customer satisfaction and future customer profitability. Chapter 5 addresses the method of the first empirical study. The chapter includes a discussion of the measurement instruments that were applied in this study, the questionnaire, the pre-tests, the pilot study, and the main study. Chapter 6 presents the results of the first empirical study. The chapter starts with the discussion of the preliminary data analyses. Subsequently, the measurement analyses and the tests of the hypotheses are discussed. Next, the relation between customer satisfaction and future customer profitability is further explored. The chapter concludes with a discussion of the meaning of the results of the empirical study for the assessment of the validity of measurement of customer satisfaction. The purpose of the second empirical study was to test hypotheses regarding the validity of measurement that were not addressed in the first empirical study. Chapter 7 addresses the method of the second empirical study. The chapter includes a discussion of the measurement instruments that were applied in this study, the questionnaire, the sample, and the data collection. Chapter 8 presents the results of the second empirical study. The chapter includes a discussion of the preliminary analyses, the measurement analyses, and the tests of the remaining hypotheses regarding the validity of the measurements of customer satisfaction. The chapter concludes with a discussion of the meaning of the results of the study for the assessment of the validity of the measurements of customer satisfaction. Chapter 9 is the general discussion. It discusses the results from this study and their implications for customer satisfaction theory and marketing measurement.

10

Chapter 2 Measurement of psychological constructs

Introduction

A psychological construct such as satisfaction is a theoretical construction with both linguistic and empirical content. This means that a psychological construct is a term with (a) linguistic meaning, such as any linguistic term, and (b) relations with empirical phenomena, that is, observable behaviours. Constructs are highly similar to concepts, and to some extent both terms may be used interchangeably. Hox (1997, p. 49) noted that both constructs and concepts are theoretical abstractions, meaning that they represent ideas that are formed by generalisations from similar phenomena, and that constructs refer to concepts that are sort of formally defined in scientific theories. Thus, the term concept refers to a somewhat broader group of theoretical abstractions than the term construct. The major positions regarding the ontology of psychological constructs are realism and constructivism (e.g., Borsboom, Mellenbergh & Van Heerden, 2003; Borsboom, 2005, pp. 6-9). These two positions are discussed in the next section. A third position regarding the ontology of constructs is operationalism (e.g., Borsboom, Mellenbergh & Van Heerden, 2003; Borsboom, 2006). Operationalism equates theoretical constructs with their measurements. Because it is broadly acknowledged that operationalism is untenable (e.g., Borsboom et al., 2003; Heiser, 2006; Kane, 2006), this position is not discussed in this chapter. In order to measure a construct one needs (a) to obtain a sample of instances within the corresponding behavioural domain, (b) to assess whether these instances provide a representative sample, and (c) to assess how to combine the observations into a measure of the construct of interest. For the latter purpose one needs to apply statistical models and to assess the quality of measurement. This is the major subject of psychometrics. This chapter addresses the theory on the conception and the measurement of psychological constructs. The focus of the chapter is on theory regarding the conception and the measurement of attitude-constructs, such as satisfaction. Theory that is specific for the conception and the measurement of ability-constructs, such as the various types of specific intelligence, is not taken into account.

11

Conception of psychological constructs

Scientific concepts are the core of scientific theories (Sartori, 1984; Thomson, 1961). This implies that psychological constructs are the core of psychological theories. Torgerson (1958, p. 9) denoted psychological constructs such as satisfaction as property-constructs, and contrasted them to system-constructs, which are objects and things that possess sets of particular properties. He argued that, to be of use in scientific theory, a property-construct must possess both theoretical meaning and empirical meaning (Torgerson, 1958, p. 11). Whereas theoretical meaning refers to the definition of a construct in terms of theoretical concepts, empirical meaning refers to the definition of a construct in terms of observable data. The distinction between theoretical meaning and empirical meaning of constructs is founded in linguistic and logical positivistic philosophies (e.g., Carnap, 1956; Frege, 1892). A psychological construct such as intelligence has both theoretical and empirical meaning. The theoretical meaning of intelligence entails its definition in terms of (a) the group of attributes or phenomena to which it refers, and (b) its relation with other constructs in the nomological network. The empirical meaning of intelligence entails the empirical indicators of the construct and includes, for example, the score on a particular intelligence test. However, as Torgerson (1958, p. 7) noted, the operationally defined intelligence is not universally agreed to be the same thing as the theoretically defined intelligence. There is no identity relation between the theoretical meaning of a construct and its empirical referents. There is an ongoing debate regarding the ontological status of psychological properties (e.g., Borsboom, 2005, 2006; Borsboom, Mellenbergh, & Van Heerden, 2003, 2004; Sijtsma, 2006). The major positions regarding the ontological status of psychological properties are realism and constructivism. The realistic position is founded upon the assumption that psychological properties exist as unobservable but real entities (e.g., Borsboom, 2005, p. 6). This means that a property exists independent of its observations, and that the measurement of a particular psychological property is a reflection of the entity. Borsboom et al. (2003) argued that measurement of psychological properties requires a realistic position regarding the particular construct, as the sentences Test X measures the attitude towards nuclear energy and Attitudes do not exist cannot both be true. The realistic position regarding the existence of psychological properties raises the question What is property X, which is supposed to exist? Thus, it seems that the question regarding the meaning of a particular psychological property precedes the question regarding the ontological status of this property. The question regarding the meaning of a particular psychological property is ultimately a linguistic question (Wittgenstein, 1953, 1958). This
12

means that the property is a term with a meaning that needs to be clarified on the basis of an examination of the use of the term in linguistic contexts, including psychological theories. Constructivism does not assume the existence of psychological properties as entities in a realistic sense. According to constructivism, a psychological property may be conceived of as an organisational principle with respect to behaviour. Borsboom (2005, pp. 7-9) differentiated between three constructivist movements, which are logical positivism, instrumentalism, and social constructivism. These movements have many different characteristics and concerns, but what they have in common is (a) the differentiation between a theoretical concept and an empirical concept, and (b) the denial of knowledge about the existence of theoretical concepts as realistic entities, beyond their existence as organisational principles of behaviour. Social constructivism deserves special attention because it advocates a linguistic conception of psychological constructs, meaning that it is the linguistic use of the term that grants theoretical meaning to the construct (Wittgenstein, 1958, 1953, section 43). This point of view implies that the justification of a particular construct is founded in the use of the construct within a particular language context, such as psychological theory. It makes sense to question what a particular construct refers to, whether it is appropriate in a particular context, and whether it is useful, but it makes no sense to question whether a particular construct exists in any physical or physiological sense. Empirical observations have to demonstrate the use of a construction in a particular language context. According to Wittgenstein (1953, 1958), the description of particular cases of a construction will reveal the meaning of the construction. It is fruitless to search for a sharp definition of a construction like thinking, because cases of thinking are connected to each other by family resemblances. There is no combination of defining characteristics, which separates all cases of thinking from everything else. A sharp definition will not converge with the actual use of a construct, because the actual use does not have distinct borders (Wittgenstein, 1958, p. 44). The linguistic conception of psychological properties does not defy the existence psychological properties in a realistic sense, but it does defy knowledge beyond the observable. This is best illustrated by the beetle argument (Wittgenstein, 1953, section 293): Suppose everyone had a box with something in it: we call it a beetle Here it would be quite possible for everyone to have something different in the box. One might even imagine such a thing constantly changing But suppose the word beetle had a use in these peoples language? If so it would not be used as the
13

name of a thing. The thing in the box has no place in the language game at all; not even as a something: for the box might even be empty. No, one can divide through by the thing in the box; it cancels out, whatever it is. That is to say: if we construe the grammar of the expression of sensation on the model of object and designation the object drops out of consideration as irrelevant. Sartori (1984) provided an important extension of the linguistic conception of social science concepts. He acknowledged that social science theories are generally expressed in natural language, which implies fuzzy reasoning, thinking, and operationalisation of concepts, and that language influences our reasoning and theorising. He argued that science needs a specialised language, which encompasses the unequivocal definition of its concepts. For this purpose, Sartori (1984, pp. 31-35) proposed concept analysis. Concept analysis aims at establishing the meaning of the concept by establishing the scientific definition of the concept, making sure that the concept is understood unequivocally, and determining the empirical referents of the concept. The core of concept analysis is the establishing of the scientific definition of the concept. Sartori (1984, pp. 32-33) proposed to define a concept in terms of a well-specified set of defining and accompanying characteristics. This is a verbal definition. Concepts that have different connotations in natural language have to be split, which results in unequivocally defined concepts. The empirical referents are loosely described as the real world counterpart of words, which are the objects, entities, or processes denoted by words. Sartoris (1984) concept analysis bears resemblance to the explication of constructs (Carnap, 1950, 1956). The unequivocal definition of a construct is legitimate and desirable for the development of scientific theory (Sartori, 1984; Torgerson, 1958). There is ample evidence of a negative effect of conceptual ambiguities regarding constructs on scientific progress. See Yi (1990) and Giese and Cote (2000) for discussion on the importance of an unequivocal conceptualisation of satisfaction for the development of satisfaction theory. Concept analysis is a useful starting point for research into social science concepts and marketing concepts, because it may serve to overcome these conceptual ambiguities. However, unequivocal definitions cannot bridge the gap between theoretical meaning and empirical meaning, because the meaning of a term differs from the empirical referents (Frege, 1892; Wittgenstein, 1953, 1958). Theoretical constructs exist as linguistic constructions, and they have a surplus meaning over any empirical meaning.

14

The constructivist position regarding the ontology of psychological properties is in line with psychometrics, which is concerned with the modelling of data that reflects behaviour of persons. This means that the latent trait in a measurement model is estimated from the data, but it is not the attribute behind the data (e.g., Nunnally, 1978, p. 96, pp. 105-109; Sijtsma, 2006). Lord and Novick (1968, p. 352) explained that psychometrics does not assume the existence of a property in a physical or physiological sense: nowhere in psychological theory is there any necessary implication that traits exist in any physical or physiological sense. It is sufficient that a person behave as if he/she were in possession of a certain amount of each of a number of relevant traits and that he/she behaves as if these amounts substantially determined his behaviour. Theory about psychological constructs has to take three points into consideration, which originate from the conception of psychological constructs as linguistic constructions. First, psychological constructs are terms that are used in different language contexts, such as psychological theories. The linguistic use of the term is the first observable, and the analysis of the use of the term reveals the meaning of the term. Second, psychological constructs may have empirical referents, which are behaviours interpreted in terms of the construct. The behaviours are the second observable, and they are the raw material for measurement. Third, one cannot point to one particular kind of behaviour or one particular set of behaviours, which totally cover a particular construct and nothing else. This means that a particular psychological construct is connected to a domain of behaviours that cannot be delineated sharply and cannot be listed exhaustively.

Test development

The development of scientific theory requires that its concepts can be measured adequately (Sartori, 1984; Torgerson, 1958). Psychological constructs can be measured by means of psychological tests (Chapter 1, Section 4). As a psychological construct is connected to a domain of behaviours, one can hardly depend on the observation of one instance within a domain in order to measure the construct. Moreover, Messick (1989) noticed that single items yield moderate measurements of constructs because they almost certainly reflect a confounding of multiple determinants. Consequently, the measurement of a psychological

15

construct on the basis of a single item will be biased. This problem is solved with multipleitem scales, if the different items have different unique components that are mutually independent. Scientific research has suggested different methods for the development of psychological tests. Oosterveld (1996, p. 25) categorised these methods in three approaches for test development, which are the deductive approach, the intuitive approach, and the inductive approach. The methods of the deductive approach are based upon explicit theory about the construct of interest. This theory is the basis of the formulation of a definition of the construct and eventually the content of the items and the composition of the test (Oosterveld, 1996, p. 25). The methods of the intuitive approach are based upon implicit knowledge and implicit hypotheses regarding the construct of interest. There is no theory regarding the construct of interest that grounds the formulation of a definition of the construct and eventually the content of the items and the composition of the test. The methods of the inductive approach are exploratory. A test is developed on the basis of observable relations between either the items or the items and some criterion. The methods may be characterised as data driven, which means that the analysis of the available data makes up the core of test development. On the basis of empirical research into the quality of different methods, Oosterveld (1996, p. 127) concluded that the deductive approach to test construction yields better tests than the intuitive and inductive approaches. This means that the methods of the deductive approach yielded tests that provided test scores having better validity and reliability than the methods of the other approaches. Oosterveld (1996) studied two methods of the deductive approach, which were the construct method (Jackson, 1971, 1973) and the facet design method (Guttman, 1954). The methods can be described in terms of four components, which are (a) the conception of the construct, (b) scale development, (c) scale construction, and (d) evaluation of scale scores (Oosterveld, 1996, p. 24). The construct method (Oosterveld, 1996, pp. 16-20) is a theory-oriented method. The first step of the method is the definition of the construct on the basis of scientific theory regarding the construct. The definition of the construct in terms of phenomena and attributes that it refers to is called the explicit definition, and the definition of the construct in terms of its relation with other constructs in the nomological network is called the implicit definition (Schouwstra, 2000, p. 61). The second step of the method is elaboration or scale development.
16

This step includes item specification, item production, and item judgement. The items need to be content saturated. This means that each item should correlate relatively high with the scale score that represents the concept the item is expected to measure, and relatively low with scale scores representing other concepts (Oosterveld, 1996, p. 19). Thus, each item must possess convergent and divergent validity. The third step of the method is scale construction, which refers to the application of a measurement model to the empirical data aimed at producing a scale on which persons can be measured with respect to the concept of interest. The fourth step of the method is the evaluation of the scale scores. This step includes, for example, the assessment of reliability and construct validity of scale scores. It may be noted that the construct method bears resemblance to Churchills (1979) procedure for test development in marketing research. Guttman (1954; see also Hox, 1997) introduced the facet design. The facet design defines a universe of observations by classifying them with a scheme of facets (i.e., variables) that contain different elements (i.e., values). Facet theory distinguishes three types of facets, which are (a) population facets, which classify the population, (b) content facets, which classify the concept, and (c) response facets, which classify the behaviours. Each of these facets has one or more distinct values that are called the elements of the facet. The product of all elements of all facets defines the universe of observations. The facet design method (Oosterveld, 1996, pp. 20-24; Stouthard, Mellenbergh & Hoogstraten, 1993) is a method for test development that is aimed at the optimisation of content validity by means of a systematic representation of the concept. The concept is represented on the basis of the combination of one or more content facets. Each content facet has one or more elements, and a particular combination of elements of each content facet is called a structuple (Oosterveld, 1996, p. 22). The product of all elements of all content facets defines the set of structuples and delineates the concept (see, e.g., Section 4 from Chapter 4). The second step of the method is elaboration or scale development. This step includes item specification, item production, and item judgement. The items have to be derived from the facet structure. Each item must be specific for a single structuple of the facet structure. The third step of the method is scale construction. Scale construction refers to the analysis of the data by means of a measurement model, aimed at producing the measurement scales and the scale scores. The fourth step is the evaluation of the scale scores. This step includes, for example, the assessment of reliability and construct validity of scale scores. Both the construct method and the facet design method incorporate some kind of concept analysis that clarifies the meaning of the construct of interest and facilitates its
17

definition. In the case of the facet design method, this analysis should facilitate a definition of the construct in the format of a facet design, and in the case of the construct method this analysis should facilitate an explicit and an implicit definition of the construct. However, it is not immediately clear what this concept analysis is, that reveals the meaning of the construct and facilitates its definition. Wittgenstein (1953, 1958, p. 44) argued that it is the examination of examples of the use of a term in language contexts that reveals the meaning of the term. Following this argumentation, it is appropriate to examine the use of the term in various language contexts, including scientific theories, in order to clarify the meaning of the construct and to develop a research definition of the construct. In practice, this requires the inventarisation of diverse studies into the construct, and the examination of the conception of the construct in these studies. See Giese and Cote (2000) for an example of this practice in consumer satisfaction research; that is, the examination of definitions of consumer satisfaction in scientific research, the analysis of similarities and differences between these definitions, and the introduction of a framework for the development of context-specific definitions of consumer satisfaction.

Measurement process

Coombs (1964, p. 4) represented the process of psychological measurement in a scheme (Figure 1). The observations Coombs (1964) referred to are observations of behaviour, and the data are psychological data. In phase one of the process, the researcher has to decide on the collection of observations. The universe of observations is theoretically unlimited, and it is up to the researcher to choose and to record particular observations from a particular research population. In phase two, the researcher transforms the observations into data. It always takes some decision or action on the part of the researcher to create the data on the basis of his/her observations. Therefore, Coombs (1964, pp. 3-6, 29) conceived of data as interpretations of observations by the researcher. In phase three, the researcher applies a measurement model to the data in order to construct one or more scales, and to classify the stimuli and/or the persons. A scale represents a property, and the classification of stimuli and/or persons on a scale constitutes the measurement of a property. Thus, it is properties of stimuli and/or persons, which are measured, and it is stimuli and/or persons, which are classified (Torgerson, 1958, p. 9).

18

Universe of potential observations Phase 1

Recorded observations

Data

Inferential classification of individuals and stimuli Phase 3

Phase 2

Figure 1: The Measurement Process (Coombs, 1964, p. 4)

Figure 1 illustrates that the scaling analyses are not at the core but at the end of the measurement process. Coombs (1964, p. 5) argued that the phases preceding the scaling analyses are at least as important components of the measurement process. Furthermore, the scheme illustrates that each phase encompasses one or more decisions made by the researcher, which influence the output of the phase concerned and the measurements. For example, the researcher may code the answers to some closed question as nominal data, ordinal data, or numerical data, and use a suitable measurement model to analyse the data. The coding of the responses and the choice of the measurement model are based upon assumptions made by the researcher with respect to the observations that he or she made. For this reason, Coombs (1964, p. 5) noted that psychological data and measurements and scales are theory. Psychometrics suggests different measurement models that may be applied in the last phase of the measurement process. The major types of measurement models are the classical test theory (CTT) model (Lord & Novick, 1968), the item response theory (IRT) models (e.g., Embretson & Reise, 2000) and the factor analytic models (e.g., Bollen, 1989; Gorsuch, 1983). It is noteworthy that different measurement models may yield different scales of the property, which means that they may yield different classifications of persons. The choice of a researcher for a particular measurement model may be based on the hypothesised relationship between the data and the property, the desired level of measurement, and the intention to test hypotheses about the fit of the model. The quality of measurement is not self-evident but has to be demonstrated. The major criteria with respect to quality of measurement are the fit of the measurement model, the reliability of the scale scores, the generalisability of conclusions, and the validity of the interpretation of the scale scores (Molenaar, 1995). The first criterion is the fit of the measurement model. The measurement model is a formal representation of the expected data structure. The fit of the model refers to the extent
19

to which the theoretical assumptions of the model regarding the structure of the data match the empirical data. This is, for example, the extent to which the theoretical correlation matrix that is based upon the scale scores is in agreement with the empirical correlation matrix, or the extent to which a theoretical assumption such as unidimensionality is in agreement with the dimensionality of the empirical data. A major advantage of IRT models such as the Mokken model (Mokken, 1971) and the Rasch model (Rasch, 1960) is the availability of powerful tests of the fit of the model to the data (Molenaar, 1995). Since these models imply testable statements regarding the structure of the data, their fit can be falsified on the basis of empirical data. The second criterion is reliability, which refers to the accuracy of scale scores. The reliability coefficient originated from CTT, and is defined as the ratio of the true score variance and the observed score variance in the population of interest. Neither the true scores, which are defined as the observed scores minus the measurement errors, nor the true score variance can be observed. Therefore, the reliability coefficient has to be estimated by other means, such as the internal consistency coefficient, which is known as coefficient alpha (Cronbach, 1951). The reliability coefficient is generally used to obtain the standard error of measurement in scale scores. The standard error of measurement is used to estimate a confidence interval for a persons true score, and can be used for testing hypotheses about the true score. For example, it can be tested whether two scale scores, which serve as estimates of the true scores, are different, or whether a scale score is significantly different from a cut score. In IRT, an item response function is defined for each item in the test. For a particular item, the item response function defines the probability of a particular score given the persons measurement value on the scale of interest. Thus, persons with different measurement values have different probabilities of providing a particular score. An example is an item response function that defines the probability of a correct answer to a particular arithmetic item as an increasing function of arithmetic ability. Persons having higher arithmetic ability levels have higher probabilities of giving the correct answer. The use of item response functions implies that the magnitude of the measurement error depends on the persons location on the scale. Thus, one person may be measured with greater accuracy using a particular item and a particular test than another person who has another scale location (Molenaar, 1995). The third criterion is generalisability, which refers to the extent to which conclusions from measurement analyses are generalisable over various conditions. To assess the
20

generalisability of conclusions, one has to study the sources of randomness in measurement (Molenaar, 1995). Major sources of randomness are (a) the sampling of persons, (b) the sampling of items, (c) the test conditions, and (d) the mode of administration of the test. For example, due to differences in test conditions (e.g., Messick, 1989, p. 81) a set of items may constitute a scale in one empirical study but not in another empirical study. This necessitates the assessment of the fit of the measurement model in different empirical studies in which the measurement instrument is used. Furthermore, the mode of administration may influence the responses to test items. For example, results obtained via telephone interviews cannot be compared with results obtained from on-line interviews without having investigated the comparability of these modes of data collection (e.g., Bronner & Kuijlen, 2007). It is recommended to reflect on the plausible sources of randomness in advance of a study and, if necessary, to test empirically whether particular generalisations are justified (Molenaar, 1995). The fourth criterion is validity. Messick (1989, p. 13) defined validity as an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. This definition entails validity of measurement (i.e., the validity of test-score interpretations for describing a person; Cronbach, 1971, pp. 445-449) and validity for decision-making (i.e., the validity of test-score interpretations for making decisions about a person; Cronbach, 1971, pp. 445-449). Validity is extensively discussed in the next session.

Validity

The concept of validity has evolved throughout time (e.g., Anastasi, 1986; Angoff, 1988; Schouwstra, 2000). Initially, validity was conceived of as the degree to which a test measures what it purports to measure (Kelley, 1927). Validity was demonstrated on the basis of the correlation of test scores with some criterion, which is called criterion-related validity (e.g., Anastasi, 1988, p. 145; Cronbach & Meehl, 1955). However, it proved to be difficult to find objective criteria for different kinds of measurements, such as measurements of different psychological constructs. This problem gave rise to new methods for establishing validity and eventually to different conceptualisations of validity, such as (a) criterion-related validity, (b) content validity, and (c) construct validity (Cronbach & Meehl, 1955).

21

Content validity is established by showing that the behaviours sampled by the test are a representative sample of the domain of interest (e.g., Anastasi, 1988, p. 140; Cronbach, 1971, p.451; Messick, 1989, pp. 39-42; Murphy & Davidshofer, 1991, pp. 107-109). As such, content validity pertains to evidence about the domain coverage and the degree to which the content of the test represents the domain. In order to establish content validity, one must depart from an elaborated definition of the construct of interest. This definition should include a detailed description of what the construct refers to, and of what the construct does not refer to but may be related to (Schouwstra, 2000). Content validity is then established on the basis of the comparison of the structure of the test with the specified structure of the construct. Thus, content validity is a property of tests rather than of test-score interpretations (Messick, 1989, p. 17) Two additional remarks are in order. First, content validity has to be incorporated at the onset of test development. For example, Messick (1989, p. 39) noted that, on the basis of the construct definition, a researcher can develop a test which covers all aspects or facets of the construct of interest according to a specified rule such as equal coverage, which means that all aspects or structuples are equally represented in the test. This is the core of content validity. Second, content validity should not be confused with face validity. The latter pertains to whether the test looks valid to test users, and not to what the test scores actually reflect (Anastasi, 1988, p. 144). Therefore, validity theorists do not consider face validity as a conceptualisation of validity. Cronbach and Meehl (1955) conceived of construct validity as the appropriateness of test-score interpretations. They discussed construct validation, and they concluded that construct validation may include many investigations, such as research into content validity, criterion-related validity, inter-item correlations, and inter-test correlations. Furthermore, they proposed defining a construct by means of a network of associations or propositions in which the construct of interest occupies a central position. This network is the nomological network. The study of relations between test scores and measurements of concepts in the nomological network provides evidence pro or contra construct validity. Construct validation requires the integration of all evidence into a judgement of construct validity. Because this judgement is qualitative by nature, it cannot be expressed as a single coefficient, such as the reliability of test scores (Cronbach & Meehl, 1955). One additional remark is in order. Cronbach & Meehl (1955) explained construct validation, and their explanation illustrates that they conceived of construct validity as the

22

appropriateness of test-score interpretations (see also Cronbach, 1971, p. 447). However, they did not provide an explicit definition of construct validity. The lack of an explicit definition may have contributed to confusion about the meaning of construct validity. For example, Churchill (1979) conceived of construct validity as a property of a test, which does not match the conception of construct validity as the appropriateness of test-score interpretations. Churchill (1979) and Peter (1981) introduced construct validity in the marketing literature. The work of these authors has guided validation research in academic marketing research up to the present day. Elaborating on the work of Cronbach and Meehl (1955) and Campbell and Fiske (1959), they split construct validity into (a) nomological validity, (b) divergent validity, and (c) convergent validity. Nomological validity refers to the relationships between the test scores and measures purported to assess different but related concepts. Discriminant or divergent validity refers to the extent to which test scores differ from measures of other concepts that are expected to be different from the concept of interest in theoretically interesting ways. Convergent validity refers to the extent to which test scores correlate with other measurements of the same construct. Churchill (1979) and Peter (1981) proposed multitrait-multimethod (MTMM) research (Campbell & Fiske, 1959) to investigate construct validity. MTMM research requires measurements of at least two traits by at least two methods, so that each trait is measured by each method. The MTMM matrix consists of the correlations between (a) the same trait measured by means of different methods, (b) different traits measured by means of the same method, and (c) different traits measured by means of different methods. Convergent validity is assessed on the basis of inspection of the first set of correlations, divergent validity is assessed on the basis of inspection of the second set of correlations, and method bias is assessed on the basis of a comparison of the second and the third set of correlations. Belson (1986) explicitly addressed the subject of validity in survey research. The measurement of psychological constructs is typically based upon survey research. Thus, the quality of the survey data delineates the validity of measurements of psychological constructs. Belson (1986) noted that the accuracy of answers to survey questions cannot be taken for granted because misinterpretations of questions, memory decay of participants, and unwillingness to respond may contaminate the data. Ample evidence exists of the effects of questionnaire format, questionnaire length, and the wording of questions and response categories on the responses of participants to questions (e.g., Belson, 1981; Bradburn, 1983; Rugg, 1941; Schuman & Presser, 1981; Sudman & Bradburn, 1982; Saris et al. 1998; Scherpenzeel, 1995). For these reasons, Belson (1986) proposed assessing validity in survey
23

research on the basis of an investigation of the quality of the answers given to survey questions. This includes the investigation of the quality of opinion data. Belson (1986) proposed various techniques to assess the validity of survey research, such as (a) the evaluation of the data collection procedure in terms of the known principles of question formulation and questionnaire design, (b) the pre-testing of the questions, and (c) the execution of a pilot of the questionnaire. Messicks (1989, p. 13) definition of validity is important for various reasons. First, the definition expresses unequivocally that the subject of validation is the interpretation and the use of test scores. This is in agreement with the practice of validation in psychological research, which is to investigate the meaning of test scores in a specific context and the usefulness of test scores for various decision-making purposes (e.g., Anastasi, 1988; Cronbach, 1971; Murphy & Davidshofer, 1991). Second, the definition expresses that different lines of evidence have to be considered when making a judgement of validity. This includes evidence of criterion-related validity, content validity, and the original conception of construct validity (Cronbach & Meehl, 1955). Third, the definition expresses that these different lines of evidence cannot be integrated into a single coefficient, but have to be integrated into a judgement regarding the test-score interpretation (e.g., Cronbach, 1971, p. 464; Cronbach & Meehl, 1955; Messick, 1989, 1995). This judgement has a gradual nature (Messick, 1989, p. 13), which implies that the test-score interpretations may have high validity, moderate validity, low validity, or no validity at all. Fourth, the definition expresses that validation is an unending process that includes the judgement of evidence gathered in the processes of test development and test use (Anastasi, 1986, 1988; Cronbach, 1971, p. 452; Messick, 1989, p. 13). Fifth, Messick (1989, pp. 20-21) differentiated between the assessment of construct validity and the assessment of the consequences of the use and the interpretation of test scores as the two bases of validity. In that context, Messick (1989, pp. 20-21, 34; 1995) argued that construct validity comprises the rationales and evidence supporting the trustworthiness of test-score interpretations in terms of the construct of interest, and that the validation of decision-making practices of test scores comprises the appraisal of social consequences of the use and interpretation of test scores. Messick (1989, pp. 34-35, 1995; see also Cook & Campbell, 1979) addressed two general threats to construct validity, which are construct underrepresentation and irrelevant variance. Construct underrepresentation refers to the risk of measuring only a part of the construct of interest, such as only the cognitive aspect of customer satisfaction instead of both the cognitive and affective aspects of the construct (e.g., Oliver, 1997, p. 343). Irrelevant
24

variance refers to the risk of measuring more than just the construct of interest, such as other traits, concepts related to specific group membership, or response tendencies. Both construct underrepresentation and irrelevant variance refute the interpretation of test scores in terms of a reflection of the construct of interest and nothing else. It may be noted that the common practice of collecting empirical evidence for a network of associations between measurements does not exclude the two threats to construct validity. When a relationship is found between the measure of the attribute and other attributes, the test score may still reflect only part of the attribute. Also, the test score may reflect something more than just the attribute of interest. Messicks (1989, 1995) conception of construct validity as a property of test-score interpretations is todays dominant conception of construct validity in psychometrics. However, it does not provide a clear-cut methodology for investigating construct validity. For this purpose, Schouwstra (2000, pp. 58-59) proposed the deductive design, which is a methodology for the development of tests for typical behaviour such as behaviour related to satisfaction and construct validation. The deductive design is consistent with Messicks conception of construct validity. Schouwstras methodology encompasses the collection of theoretical and empirical evidence regarding the interpretation of test scores in terms of the construct of interest, and nothing else. As such, it takes the two global threats to construct validity into account, which are construct underrepresentation and construct irrelevant variance. Borsboom et al. (2004) criticised Messicks (1989, p. 13) definition and conception of validity. They subscribed to Kelley (1927) that a test is valid if it measures what it purports to measure, and they defined validity of tests accordingly: A test is valid for measuring an attribute if (a) the attribute exists in the real world, and (b) variations in the attribute causally produce variations in the outcomes of measurement procedures. Thus, Borsboom et al. (2003, 2004) defined validity as a property of tests, and they took a realistic stance regarding the nature of psychological constructs. They opposed Cronbach and Meehl (1955) and Messick (1989, 1995), who conceived of construct validity as a property of test-score interpretations, and conceived of psychological constructs as postulated attributes of people. Two additional remarks are in order. First, Borsboom et al. (2004) argued that the conception of validity as a property of tests has direct relevance for validation research. Evidence of validity should be based upon research into the response process, that is, the relation between the attribute and response behaviour. The research should test a hypothesis with respect to the processes that lead to measurement outcomes. This amounts to a test of a causal theory about the relation between attribute and response behaviour. Because a
25

nomological network is not a theory of the causal relation between attribute and test score, the authors considered the nomological network irrelevant for validation research. Thus, in their view validation research should not assess the relationship of the construct with other constructs in the nomological network, but test a causal theory about the processes that evoke behaviour. Second, Borsboom et al. (2004) argued that the conception of validity as a property of tests has direct relevance for test construction. A large part of test validity research has to be done at the stage of test construction. Test development should depart from a theory on the causal relation between the attribute and behaviour. This approach to test development has been applied successfully with respect to measurement of some specific ability constructs, such as transitive reasoning (Bouwmeester & Sijtsma, 2006) and cognitive development (Jansen & Van der Maas, 1997).

Discussion

There is no broad consensus on either the conception of validity or the methodology of validation research. This is due partly to different conceptions of validity being based upon different conceptions of psychological constructs, and partly to validity theory that is still developing and has not yet come to a conclusion. We discerned three perspectives on validity and validation research that are important for current academic research: (a) the Churchill perspective, (b) the Messick perspective, and (c) the Borsboom perspective. These perspectives are presented in Table 1. The Churchill perspective on construct validity is the leading perspective on construct validity in academic marketing research. It was introduced in Churchills (1979) procedure for test development in marketing research. Peter (1981) and Fornell and Larcker (1981) further elaborated Churchills perspective and the associated methods for validation research. Churchills procedure for test development in marketing research has contributed markedly to the measurement of psychological constructs in the corresponding domain (e.g., Bearden, Netemeyer, & Mobley, 1993), but Churchills perspective on construct validity is not in line with modern theories of construct validity. The criteria associated with Churchills perspective do not address the two global threats to construct validity, which are construct underrepresentation and construct irrelevant variance (Messick, 1989, 1995; Schouwstra, 2000). Consequently, the methods associated with this perspective do not suffice for the assessment of construct validity.

26

Table 1: Three Perspectives on Validity and Validation Research


Churchill perspective Theoretical foundation Conception Construct validity is a property of tests Criteria Convergent validity Divergent validity Nomological validity Prototypical design Outcome MTMM design Correlation with criterion Gradual judgement of validity Gradual judgement of validity Binary judgement of validity Construct validity is a property of test-score interpretations Quality of construct representation Absence of irrelevant variance Deductive design Experimental design Test of causal theory Validity is property of tests Constructivism Messick perspective Constructivism Borsboom perspective Realism

First, content validity receives insufficient attention in Churchills perspective on construct validity. Moreover, content validity was confused with face validity (e.g., Churchill, 1979; Bearden et al. 1993, p. 3). This may be considered the major flaw of the Churchill perspective, because face validity only provides intuition for a particular interpretation of what the test measures. Instead, empirical evidence is needed to support construct validity. Such evidence comes from the investigation of the fit of the measurement model, the plausible sources of measurement bias, and the nomological network of the construct. The investigation of a tests content validity adds to the process of construct validation in that it provides evidence whether the item set used in the test is representative for the hypothetical domain of items used to operationalise the attribute (e.g., Messick, 1989, pp. 36-42). Second, the practice of MTMM research does not generate strong evidence of construct validity. This is partly due to the fact that MTMM research is not concerned with content validity, and partly due to the lack of direction of how to choose appropriate traits and methods in MTMM studies. For obtaining strong evidence of construct validity, it is necessary that the traits chosen are clearly similar and that the methods chosen are clearly different. For example, Anastasi (1988, p. 158) argued that the agreement between two measures of the same trait that are obtained by maximally similar methods reflects reliability, and that the agreement between two measures of the same trait that are obtained by maximally different methods reflects validity. In general, the methods applied in MTMM studies are
27

quite similar (e.g., Byrne, 1989; Churchill & Supranant, 1982; Fornell & Larcker, 1981; Saris et al., 1998; Scherpenzeel, 1995; Wirtz & Lee, 2003). As a consequence, the agreement between different measures of the same trait mostly reflects reliability rather than validity. The Messick perspective is the leading perspective on construct validity in psychology. In this perspective, construct validity is conceived of as a property of test-score interpretations (i.e., the appropriateness of the interpretations of test scores in terms of the construct of interest; this is also labeled validity of measurement, validity of test-score interpretations, and construct validity of test scores). The best argument in favour of this conception of construct validity is that a test may yield valid measurements of the construct of interest in one context, and invalid measurements of the construct in another context. Moreover, a particular interpretation of a test score may be valid while another interpretation is invalid. Therefore it is the test-score interpretation that needs to be validated, and not the test. The Messick perspective matches the constructivist position regarding the ontology of psychological constructs. This is a major virtue of the Messick perspective. Another major virtue is that it can be put into action by the deductive design (Schouwstra, 2000). The deductive design provides a methodology for validation research that addresses the two global threats of construct validity, and that is in line with Messicks conception of construct validity. Also, the deductive design incorporates the rationales behind test development. This is in agreement with the notion stipulating that construct validation starts with the process of test development. For these reasons, we subscribe to Messicks perspective on construct validity and Schouwstras methodology for validation research. The Borsboom perspective is important for several reasons. First, it advocates a theorydriven approach to construct validation. Borsboom et al. (2004) rightly argued that construct representation is at the core of validity, and that proof of construct representation is founded in theory regarding the construct of interest. Second, Borsboom et al. (2004) demonstrated the limited usefulness of investigating convergent, divergent, and nomological validity. They rightly argued that the investigation of these types of validity is subordinate to other evidence regarding construct representation, such as theory testing. Third, Borsboom et al. (2004) recommended that one explicates and tests theories of response behaviour. This is a useful suggestion, because there is ample evidence of the disturbing influence of method characteristics on response behaviour (e.g., Belson, 1981; Belson, 1986; Bradburn, 1983; Rugg, 1941; Schuman & Presser, 1981; Sudman & Bradburn, 1982; Saris et al. 1998; Scherpenzeel, 1995). Fourth, Borsboom et al. (2004) criticised Messicks (1989, p. 13) definition of validity as a judgement instead of a property. We subscribe to that criticism
28

where it concerns construct validity, but not where it concerns validity for decision-making uses of test scores. Borsboom et al. (2004) subscribed to Kelley (1927) that a test is valid if it measures what it purports to measure. Thus, the Borsboom perspective is characterised by the conception of validity as a property of tests. This conception of validity is problematic, because whether a test measures what it purports to measure does not depend exclusively on the content of the test. It also depends on, for example, the administration of the test, the population in which the test is used, and eventually on the research goal. Thus, a particular test may measure what it purports to measure in one instance but not in another instance. Consequently, validity has to be assessed with each administration of a test, and this justifies a conception of validity as a property of test-score interpretations. The major weakness of the Borsboom perspective is its foundation in a realistic conception of properties, which causes three problems. The first problem pertains to the meaning of the statement Property X exists. According to realism, the statement expresses that property X exists as an entity, independent of the observations (Borsboom, 2005, p. 6). We consider this interpretation inappropriate, because properties are organisational principles through which we perceive and interpret the world. Some of these organisational principles are useful because they have many empirical referents. An example is aggression. Other organisational principles are less useful because they have few if any empirical referents. An example is clairvoyance. Thus, we contend that the statement property X exists expresses that property X exists as an organisational principle. The second problem pertains to the statement that variations in the property cause variations in the outcomes of measurement procedures. This statement cannot be tested because one cannot observe covariation between an unobservable entity and its measurement. Thus, one cannot know whether this statement is true. The third problem pertains to the definition of properties. Borsbooms perspective requires a well-specified theory on the relationship between the property and response behaviour. The theory should specify the set of responses for each level of the property, how responses vary if levels of the property vary, and which response patterns exist and which not. This amounts to a definition of the property in terms of response patterns, but that cannot be the meaning of the property. The Borsboom perspective may suit abilities, such as transitive reasoning, for which the meaning is close to its operationalisation. However, for psychological attributes such as satisfaction the Messick perspective is to be preferred, because it is founded on a constructivist position regarding the ontology of psychological properties.
29

7 1.

Conclusions A psychological construct is a theoretical concept with theoretical and empirical meaning. There is, however, no identity relation between the theoretical meaning and the empirical meaning. This means that a construct has a surplus meaning over its empirical indicators.

2.

The theoretical meaning of a construct is linguistic by nature. It is the linguistic use of a construct that grants meaning to the construct, and it is the examination of the linguistic use that demonstrates the theoretical meaning of the construct. This means that the theoretical meaning of a construct should be studied by means of an examination of various examples of the linguistic use of that construct.

3.

The theoretical meaning of a construct encompasses (a) the group of attributes and phenomena the construct refers to, and (b) the relation of the construct with other constructs in a nomological network. The former component is expressed in the explicit definition of the construct and the latter component in the implicit definition of the construct (Schouwstra, 2000, p. 61).

4.

The empirical meaning of a construct embraces a domain of behaviours that cannot be delineated sharply and cannot be listed exhaustively. Nevertheless, the construct has to be measured on the basis of different observations from this behavioural domain. The sampling of these observations constitutes the first phase of the measurement process (Coombs, 1964, p. 4).

5.

The development and validation of psychological theory requires measurements of constructs that are in line with their theoretical meaning. This supports a deductive approach to test development, which means that the development of the test is based upon a formal definition of the construct of interest.

6.

The Messick perspective on construct validity corresponds best with the linguistic conception of psychological constructs. In this perspective, construct validity is the appropriateness of test-score interpretations in terms of the construct of interest.

7.

The deductive design exemplifies how to validate measurements according to Messicks perspective. For this reason, we chose the methodology of the deductive design for test development and construct validation in the empirical study (Chapter 4 onwards).

30

31

32

Chapter 3 The theoretical meaning of customer satisfaction

Introduction

In chapter 2, we concluded that the theoretical meaning of a construct is inherently linguistic, and that it is the linguistic use of the term that grants meaning to the construct (Wittgenstein, 1953). For this reason, the theoretical meaning of customer satisfaction has to be clarified by means of an examination of the linguistic use of the term. This is the examination of examples of the linguistic use of the term in scientific studies as well as its use in everyday language. In the present chapter, the theoretical meaning of customer satisfaction is investigated. The investigation encompasses an examination of (a) conceptions of satisfaction, (b) conceptions of dissatisfaction, (c) theories of satisfaction, (d) concepts in the nomological network of satisfaction, and (e) measures of satisfaction in the marketing literature. Based on the results of the investigation, the term customer satisfaction is explained and defined. The explicit definition of customer satisfaction addresses the group of attributes and phenomena that customer satisfaction refers to, and the implicit definition of customer satisfaction addresses the connections of customer satisfaction with other concepts in a nomological network.

Conceptions of satisfaction

A review of the marketing literature by Yi (1990) and Giese and Cote (2000) yielded a multitude of definitions of consumer satisfaction, customer satisfaction, summary satisfaction, and transaction-specific satisfaction. The different definitions of these terms reflect different conceptions of satisfaction. In order to clarify the theoretical meaning of satisfaction, we examined the major conceptions and the corresponding definitions of satisfaction in the marketing literature. The marketing literature distinguishes two important conceptions of satisfaction. The first is satisfaction as a response to disconfirmation (Table 1, first column) and the second is satisfaction as a valenced response to consumption (Table 1, second column). Both conceptions can be applied to transaction-specific satisfaction (Oliver, 1997, p. 15), which

33

concerns satisfaction with single encounters with the focal object (Table 1, first row), and to summary satisfaction (Oliver, 1997, p. 15), which concerns satisfaction with the accumulation of encounters with the focal object (Table 1, second row). Each cell in Table 1 is associated with several definitions of satisfaction, as they can be found in the marketing literature (e.g., Giese & Cote, 2000; Yi, 1990). Because the subject of this thesis is summary satisfaction with a bank, we discuss both satisfaction as a response to disconfirmation and satisfaction as a valenced response to consumption, and also the prototypical definitions of summary satisfaction associated with each of the two conceptions of satisfaction.

Table 1: Conceptions of Satisfaction in the Marketing Literature


Response to disconfirmation Valenced response to consumption Based on a single encounter with Transaction-specific satisfaction Transaction-specific satisfaction focal object Based on accumulation of encounters with focal object Summary satisfaction Summary satisfaction

Satisfaction as a response to disconfirmation Disconfirmation refers to the perceived discrepancy between pre-consumption expectations and post-consumption perceptions. The conception of satisfaction as a response to disconfirmation originated from disconfirmation theory (e.g., Oliver, 1980, 1997). According to disconfirmation theory, the level of satisfaction (and also dissatisfaction) is a function of pre-consumption expectations and disconfirmation of expectations. Whereas positive disconfirmation of expectations contributes to satisfaction, negative disconfirmation of expectations contributes to dissatisfaction. In the augmented disconfirmation theory, the level of satisfaction is also a function of the perceptions of outcomes of consumption (Oliver, 1997, pp. 119-121). The augmented disconfirmation model is represented in Figure 1. Disconfirmation theory is the dominant satisfaction theory, and was investigated in several studies (e.g., Churchill & Suprenant, 1982; De Ruyter, Bloemer, & Peeters, 1997; Oliver, 1980; Oliver, 1997; Oliver & Burke, 1999; Oliver & DeSarbo, 1988; Tse & Wilton, 1988; Van Montfort, Masurel, & Van Rijn, 2000). Although these studies are not unanimous with respect to the magnitude of the effects of expectations, perceptions, and disconfirmation

34

on satisfaction, there is evidence of the significance of each of these effects (e.g., Oliver, 1997; Oliver & Burke, 1999).

Expectations

Disconfirmation

(Dis)satisfaction

Perceptions

Figure 1: The augmented disconfirmation model of satisfaction The disconfirmation model has met with three important problems. The first problem pertains to the use of pre-consumption expectations as the comparison standard for the consumers post-consumption perceptions. Alternatives for this comparison standard are (a) the ideals held by the consumer, (b) the needs of the consumer, and (c) standards concerning fairness held by the consumer (Oliver, 1997, pp. 71-72, 133-134). Thus, there is no broad consensus about the conception of disconfirmation. The second problem pertains to the operationalisation of expectations. If one cannot get access to consumers before consumption took place, it is not possible to measure pre-consumption expectations, and instead one can only measure retrospective expectations at best. Because expectations may change during the process of consumption, retrospective expectations may differ from the pre-consumption expectations held by the consumer. The third and major problem pertains to the conception of satisfaction as a response to disconfirmation (e.g., Bloemer, 1993, p. 93; Oliver, 1980; Tse & Wilton, 1988). This conception disregards the content of the satisfaction response, which should be the core of the explicit definition of the concept (e.g., Oliver, 1997, p. 13; Sartori, 1984, pp. 32-33; Schouwstra, 2000, p. 61). The definitions of satisfaction associated with this theory define satisfaction in terms of a response to disconfirmation. For example, Tse & Wilton (1988; also see Table 2) defined

35

consumer satisfaction/dissatisfaction as the consumers response to the evaluation of the perceived discrepancy between prior expectations (or some other norm of performance) and the actual performance of the product as perceived after its consumption. Bloemer (1993, p. 61; also see Table 2) defined satisfaction as the outcome of the subjective evaluation that the chosen alternative (the brand) meets or exceeds the expectations of the person. It may be noted that the subjective evaluation is the perceived discrepancy between prior expectations and actual performance of the brand, and that the subjective evaluation results from the processing of expectations and performance of the brand. Bloemer (1993, p. 93; also, see Bloemer & Kasper, 1995; Bloemer & Poiesz, 1989) argued that the extent to which persons process expectations and performances depends on both the motivation and the ability of the person to do so. For this reason, she differentiated between latent satisfaction, which results from a low degree of processing of expectations and performances, and manifest satisfaction, which results from a high degree of processing of expectations and performances. Because this differentiation is an elaboration of the conception of satisfaction, it is an important extension of disconfirmation theory.

Satisfaction as a valenced response to consumption The conception of satisfaction as a valenced response to consumption concerns the satisfaction response to consumption experiences, and is therefore typical of consumer satisfaction and customer satisfaction. Oliver (1997, p. 28) explained valence as polarity, the positivity or negativity of a state of nature. Thus, a valenced response can be placed on a dimension that ranges from negative to positive. A special case of the valenced response is the neutral response. A neutral response to consumption is given when a person is neither satisfied nor dissatisfied with his or her consumption experience. It may be noted that in the conception of satisfaction as a valenced response to consumption, the satisfaction response is distinguished from non-valenced responses (e.g., the propositions it is dark, and 2+2=4), and valenced responses towards things, which were not consumed (e.g., a persons judgement of a car that he or she never drove). The prototypical definitions associated with this conception of satisfaction are the definitions provided by Howard and Sheth (1969), Fornell (1992), Oliver (1997), and Giese and Cote (2000). There are important differences between these definitions. Howard and Sheth (1969; also see Table 2) defined satisfaction as the buyers cognitive state of being adequately or inadequately rewarded for the sacrifices he or she has undergone. This is the

36

Table 2: Definitions of Satisfaction in the Marketing Literature


Explanation of the definition of satisfaction A prototypical definition of satisfaction in disconfirmation theory. Satisfaction is not equated with disconfirmation, but it is conceived as a response to disconfirmation. A prototypical definition of satisfaction in disconfirmation theory. The authors discriminate between the processes that evoke manifest satisfaction and the processes that evoke latent satisfaction. The definition expresses that the satisfaction response is cognitive by nature. A prototypical definition of summary satisfaction, or cumulative satisfaction. The definition expresses that the satisfaction response may have affective and cognitive content.

Author

Conception of

Definition of satisfaction

satisfaction

Tse &

Response to

The consumers response to the evaluation of the perceived

Wilton

disconfirmation

discrepancy between prior expectations (or some other norm of

(1988)

performance) and the actual performance of the product as

perceived after its consumption.

Bloemer

Response to

The outcome of the subjective evaluation that the chosen

(1993)

disconfirmation

alternative (the brand) meets or exceeds the expectations of the

person.

Howard &

Valenced response

The buyers cognitive state of being adequately or inadequately

37

Sheth (1969) to consumption

rewarded for the sacrifices she or he has undergone.

Fornell

Valenced response

An overall post-purchase evaluation.

(1992)

to consumption

Oliver

Valenced response

The judgement that a product or a service feature, or the product

(1997)

to consumption

or service itself, provided or is providing a pleasurable level of

consumption-related fulfilment, including levels of under- or The definition expresses that the satisfaction response is affective by nature.

overfulfilment.

Giese &

Valenced response

(a) an affective response of varying intensity, (b) directed

Cote (2000)

to consumption

towards focal aspects of the acquisition and/or consumption op

products and services, (c) determined at the time of purchase or

temporal points during consumption.

prototypical definition of satisfaction as a cognition. Fornell (1992; also see Anderson, Fornell, & Lehmann, 1994; also see Table 2) defined customer satisfaction as an overall post-purchase evaluation. This definition was only applied with respect to summary satisfaction, and it was the basis of several national customer satisfaction indices (Fornell, 1992; Johnson, Gustafsson, Andreassen, Lervik, & Cha, 2001). Oliver (1997, p. 13) defined consumer satisfaction as the judgement that a product or a service feature, or the product or service itself, provided or is providing a pleasurable level of consumption-related fulfilment, including levels of under- or overfulfilment. This definition requires an explanation. First, the definition expresses that satisfaction is a response to fulfilment, which implies that it is evoked during or after consumption. Second, the term judgement in the definition expresses that the satisfaction response is a valenced response. Third, the term fulfilment in the definition expresses that a goal exists, that something needs to be fulfilled. Fourth, the term pleasurable in the definition expresses that satisfaction includes affects. This notion is in line with the results from recent studies into the nature of satisfaction responses (e.g., Friman, 2004; Giese & Cote, 2000; Van Dolen, Lemmink, Mattsson, & Rhoen, 2001; Wirtz & Lee, 2003). Oliver (1997, pp. 318-319) noted that satisfaction responses may become manifest as an affect (a pleasant or an unpleasant feeling), a cognition (a positive or a negative judgement), or both. Whether the satisfaction response is manifested as an affect, a cognition, or both depends on the person, the focal object, and the context. For example, satisfaction with the postal services may become manifest in the form of cognitions, and satisfaction with dinner in a restaurant may become manifest in the form of affects. Consequently, Oliver (1997, pp. 318-319) distanced himself from the view of satisfaction as anhedonic cognition. He concluded that affects coexists alongside cognitive judgements in producing the satisfaction response. This means that satisfaction may be manifested in affects as well as in cognitions. Oliver (1997) demonstrated that satisfaction may arise from different processes, such as performance evaluations, processing of expectations, disconfirmation of expectations, need fulfilment, equity evaluations, cognitive dissonance, and processing of affects. Therefore he concluded that satisfaction may become manifest in various responses. Oliver (1997, pp. 337342) suggested differentiating between four prototypical satisfaction responses, which he labeled satisfaction-as-contentment, satisfaction-as-pleasure, satisfaction-as-delight and satisfaction-as-relief. In some contexts, satisfaction may be manifested as the absence of dissatisfaction (Giese & Cote, 2000; Westbrook & Oliver, 1991). In survey research in the automotive industry, Westbrook and Oliver (1991) demonstrated that a large part of the
38

consumers was rather unemotional about their car. In general, these consumers responded positively to satisfaction items, and negatively to dissatisfaction items. The authors argued that in this consumer segment, satisfaction might be interpreted as the absence of dissatisfaction. This implies, for example, that consumers remain satisfied until problems occur that hamper consumption. According to Oliver (1997, p. 340), absence of dissatisfaction is a special case of satisfaction-as-contentment. Oliver (1997, p. 339) described the contentment satisfaction state as a passive response to consumption that results when satisfaction states are maintained or prolonged. Contentment satisfaction or latent satisfaction (Bloemer, 1993) appears to be a common meaning of satisfaction in contexts that are characterised by stable consumption outcomes, such as the consumption of postal services or of a long-lasting consumer durable. According to Oliver (1997, p. 340), if a survey focuses on satisfaction in an ongoing-use situation, most persons will be responding from a satisfaction-as-contentment state, and fewer persons will be responding from a satisfaction-as-delight, satisfaction-as-pleasure, or satisfaction-as-relief state. Giese and Cote (2000) defined consumer satisfaction as (a) an affective response of varying intensity, (b) directed towards focal aspects of the acquisition and/or consumption of products or services, and (c) determined at the time of purchase or temporal points during consumption, and lasting for a finite but variable amount of time. This is the prototypical definition of satisfaction as an affect. Qualitative research in a sample of 158 persons (Giese & Cote, 2000) demonstrated that 60 to 70 percent of the participants explained the term satisfaction in terms of affect. This is an important result because it demonstrates the affective content of satisfaction. Giese and Cote (2000) concluded that consumer satisfaction is an affective response of a consumer towards some phenomenon. They argued that cognitions may be at the basis of the formation of consumer satisfaction, but that these cognitions do not constitute consumer satisfaction. Giese and Cote also argued that the meaning of satisfaction is context-specific. There are many contextual variables that affect how satisfaction is perceived, and these variables differ over domains in reality. For example, satisfaction with a retail bank differs from satisfaction with medical care or satisfaction with a sports car. Persons have different needs and different expectations in different contexts, and these differences influence the meaning of satisfaction in these contexts. Therefore, Giese and Cote (2000) concluded that the definition and the measurement of satisfaction also are context-specific. They proposed a

39

framework for developing context-specific definitions of consumer satisfaction. In line with their definition, the framework addresses three components of the definition of satisfaction. These components are (a) the type of affective response, (b) the timing of the response, and (c) the focus of the response. The framework should facilitate the development of contextspecific definitions of satisfaction and corresponding measurement procedures.

Conceptions of dissatisfaction

A major issue in satisfaction research, including satisfaction research in the marketing domain, is the conception of dissatisfaction. The literature provides two stances regarding the conception of dissatisfaction (Giese & Cote, 2000). Dissatisfaction is either considered to be the opposite of satisfaction on a bipolar dimension (the one-factor theory; Figure2) or satisfaction and dissatisfaction are viewed as two different dimensions (the two-factor theory; Figure 2). The latter stance postulates that an individual can be simultaneously satisfied and dissatisfied with a focal object (Yi, 1990). This means, for example, that one can be simultaneously satisfied and dissatisfied with ones car if, for example, the car is reliable but does not accelerate well. According to the one-factor theory, dissatisfaction is the opposite of satisfaction on a bipolar dimension. This stance is reflected in, for example, Olivers (1997, p. 28) definition of dissatisfaction as the negative satisfaction state, when the consumers level of fulfilment is unpleasant. Thus, he considers dissatisfaction to be the opposite of satisfaction on a bipolar dimension. It is noteworthy that the conception of dissatisfaction as the opposite of satisfaction does not defy the possibility that a consumer is satisfied with one aspect of consumption outcomes and dissatisfied with another aspect. However, it does defy the possibility that a consumer is both satisfied and dissatisfied with one phenomenon at one point in time. According to the two-factor theory (Herzberg, Mausner, & Snyderman, 1959) satisfaction and dissatisfaction have different antecedents, and should be conceived of as independent dimensions. The notion that satisfaction and dissatisfaction have different antecedents, results from research into phenomena that caused satisfaction responses and phenomena that caused dissatisfaction responses (e.g. Herzberg et al., 1959; Johnston, 1995). For example, Johnston (1995) reported that the phenomenon of helpfulness of a bank was a determinant of satisfaction with a bank, and that the phenomenon of integrity of a bank was a determinant of dissatisfaction with a bank. Similarly, Herzberg et al. (1959, pp. 72-74)

40

reported that the phenomenon of responsibility was a determinant of satisfaction with a job, and the phenomenon of salary was a determinant of dissatisfaction with a job. The phenomena that are expected to cause satisfaction responses are often labeled motivator factors or motivators, and the phenomena that are expected to cause dissatisfaction are often labeled hygiene factors or hygienes (e.g., Oliver, 1997, pp. 146-150; Wolf, 1970).

One-factor theory

Two-factor theory

Dissatisfied

Satisfied

Not satisfied

Satisfied

Not dissatisfied

Dissatisfied

Dissatisfaction is the opposite of satisfaction on a bipolar dimension

Satisfaction and dissatisfaction are unipolar constructs

Figure 2: Conceptions of satisfaction and dissatisfaction in the one-factor theory and the twofactor theory, respectively The two-factor theory is disputable because empirical research demonstrated that a phenomenon (e.g., magnitude of responsibility) can be a source of both satisfaction and dissatisfaction (e.g, job satisfaction and job dissatisfaction; for an overview of empirical studies into the two-factor theory, see Wolf, 1970; see also Oliver 1997, pp. 146150). For example, Soliman (1970) studied satisfaction and dissatisfaction of persons with their jobs, and found that satisfaction and dissatisfaction were the opposite ends of a continuum. Furthermore, Soliman (1970) found that when needs of a person were provided for adequately, motivators were more important for satisfaction/dissatisfaction than hygienes, and when needs of a person were provided for moderately, motivators and hygienes were equally important for satisfaction/dissatisfaction. Eventually, Soliman (1970) concluded that the effects of motivators and hygienes on satisfaction/dissatisfaction were dependent upon the level of need fulfilment which was already accomplished. On the basis of a review of various research findings, Wolf (1970) reached a similar conclusion.

41

Generalising the results of Soliman (1970) and Wolf (1970) implies, for example, that a persons satisfaction/dissatisfaction with his or her car depends on the level of need fulfilment which was already accomplished. Assuming that the acceleration power of a car is a motivator factor and that the reliability of a car is a hygiene factor, acceleration power of ones car is more important for satisfaction/dissatisfaction when the needs of a person are provided for adequately, and reliability of ones car is more important for satisfaction/dissatisfaction when the needs of a person are provided for badly. Russell and Carroll (1999a) investigated whether positive affect at some point in time is the opposite of negative affect at that same point in time, or whether positive affect is independent of negative affect. They defined a bipolar model of momentary affect, deduced the theoretical correlations between positive affect measures and negative affect measures, and compared these theoretical correlations with the empirical correlations observed in various empirical studies (for an overview, see Russell & Carroll, 1999a). The authors concluded that when controlling for the major factors that influence the correlation between positive affect and negative affect, which are measurement error, item selection, and response format, there was no basis for rejection of the bipolarity hypothesis. The more sources of bias against bipolarity were removed the closer the data matched the bipolar model. Consequently, Russell and Carroll (1999a, 1999b) concluded that the empirical evidence supports the bipolarity hypothesis of momentary affect. It is plausible that this conclusion can be generalised to satisfaction, and that dissatisfaction should be conceived of as the opposite of satisfaction on a bipolar dimension. This is consistent with the dominant causal theory of satisfaction, which is disconfirmation theory (e.g., Oliver, 1997; Tse & Wilton, 1988). Generalising the results of Russell and Carroll (1999a, 1999b) to satisfaction and dissatisfaction, a persons simultaneous satisfaction with the reliability of his or her car and dissatisfaction with its acceleration power does not imply that satisfaction and dissatisfaction have to be considered two different dimensions. It implies that satisfaction/dissatisfaction is assessed with respect to different attributes of the car and that, with respect to each attribute, satisfaction is the opposite of dissatisfaction on a bipolar dimension. Thus, satisfaction with a focal object can be conceived of as the opposite of dissatisfaction with the same focal object (Oliver, 1997, p. 28).

42

The dual process model of satisfaction and dissatisfaction

Oliver (1997) proposed a model that describes how both a satisfaction response and a dissatisfaction response may result from different psychological processes. This model is denoted as the dual-process model (Oliver, 1997, p. 317), because it addresses two kinds of processes, appraisal and non-appraisal of affects and cognitions, which may evoke a satisfaction response. The satisfaction response may be manifested in the form of (a) unappraised affects, (b) appraised affects, (c) unappraised cognitions, and (d) appraised cognitions. Oliver conceived of unappraised affects and unappraised cognitions as the immediate affects and the immediate cognitions that follow upon the experience of the focal object. Appraised affects and appraised cognitions refer to affects and cognitions that have been elaborated more intensively. Satisfaction responses as unappraised affect refer to the immediate pleasure or the immediate displeasure caused by consumption experiences. For example, an unappraised affect is the immediate pleasure caused by smoking a cigarette. Satisfaction responses as appraised affects result from the elaborations upon these affects. These elaborations include the attribution of affects to a particular cause, and the evaluation of the value of the affect for the individual. For example, the immediate reaction to smoking a cigarette may be the experience of satisfaction and feelings of comfort, but the cognitive elaboration upon smoking may yield feelings of doubt and eventually dissatisfaction. Unappraised cognitions are factual cognitions regarding consumption outcomes, which are not further processed and do not raise affects. The processes evoking unappraised cognitions account for the manifestation of satisfaction as anhedonic cognitions; for example, noticing that ones car functions well without experiencing any feelings whatsoever (e.g., Oliver, 1997, pp. 318; Westbrook & Oliver, 1991). Satisfaction responses as appraised cognitions result from elaborations of cognitions resulting from consumption experiences, such as the satisfaction responses that result from disconfirmation of expectations. For example, contrary to expectation ones car may not function well. The disconfirmation may evoke feelings of displeasure and eventually dissatisfaction. The dual-process model is represented in Figure 3. It may be noted that affects, cognitions, and satisfaction are psychological properties, and that consumption and appraisal are activities. The dual-process model accounts for different manifestations of satisfaction. First, the process evoking unappraised affects accounts for the manifestation of satisfaction as an affective response to consumption experiences. The conception of satisfaction as unappraised

43

affect is a special case of the manifestation of the satisfaction response according to the definition of satisfaction by Giese and Cote (2000), which also includes affective appraisals of cognitions. Second, the process evoking appraised affects accounts for the manifestation of satisfaction as an overall evaluation. This manifestation of the satisfaction response may be interpreted as a special case of the definition of satisfaction by Fornell (1992), which seems to be focussed primarily at the cognitive evaluation of consumption experiences without explicitly distinguishing immediate cognitions and elaborations of cognitions, but far less at affects. Third, the process evoking unappraised cognitions accounts for the manifestation of satisfaction as anhedonic cognitions (e.g., Oliver, 1997, pp. 318; Westbrook & Oliver, 1991). Fourth, the process evoking appraised cognitions accounts for the manifestation of satisfaction as a response to cognitions, such as the affective response to disconfirmation. This manifestation of the satisfaction response is consistent with the definition of satisfaction given by Giese and Cote (2000).

Affects

Consumption

Appraisal

(Dis)satisfaction

Cognitions

Figure 3: Dual-process model of satisfaction and dissatisfaction T he dual-process model is in agreement with the conception of satisfaction as a valenced response to consumption experiences, and with Olivers (1997, p. 13) definition of satisfaction. Therefore, the dual-process model constitutes an important contribution to satisfaction theory. However, two remarks are in order. First, according to the dual-process model appraisal is either present or absent. This may be a simplification of reality, because appraisal may be represented by a continuum ranging from absence of appraisal to presence
44

of appraisal. Second, the dual-process model does not express the conditions under which appraisal is present or absent. Therefore, further research is needed to elaborate the model.

Concepts in the nomological network of customer satisfaction with a retail bank

This section addresses the nomological network of customer satisfaction in the context of retail banking (Figure 4). The nomological network of a concept is the network of associations of a concept with other concepts. The nomological network with respect to satisfaction that is relevant in this study includes the concepts of trust, quality, loyalty, and profitability. This nomological network is shown in Figure 4. The four concepts are (a) considered important in the financial services industry, and (b) expected to be related to customer satisfaction in this industry. According to many theorists (e.g., Hennig-Thurau, Gwinner, & Gremler, 2002); Luo & Homburg, 2007; Oliver, 1997; Verhoef, 2001; Yi, 1990), customer satisfaction is also related to concepts such as word-of-mouth, image, commitment, marketing communication, retention, and cross-sell. Each of these concepts may be further split up into part concepts. For example, image may be split up into corporate associations, corporate image, and corporate reputation (e.g., Berens, 2004), and commitment may be split up into affective commitment and calculative commitment (e.g., Verhoef, 2001). These additional concepts were ignored in this study, because (a) trust, quality, loyalty, and profitability were considered of primary importance to satisfaction research in the context of retail banking, (b) inclusion of all concepts would introduce redundancy, such as the inclusion of both loyalty (primary importance) and commitment (alternative concept), and (c) it was anticipated that the measurement of all concepts in a survey would produce a questionnaire that would be too long and ask too much time and effort of the participants of this study. Even though one might argue that the alternative concepts also have a place in the nomological network of satisfaction, we decided to leave then out to maintain a simple model tailored to the practice of this study (Chapter 4 onwards). First, the relationship between trust and customer satisfaction is discussed. Trust is considered to be of major importance in retail banking, and has been shown to be related to customer satisfaction (e.g. Hennig-Thurau et al., 2002; Singh & Sirdeshmukh, 2000; Verhoef, 2001). Trust is often seen as an antecedent of satisfaction (but for an exception, see Singh & Sirdeshmukh, 2000); thus, in Figure 4 an arrow runs from trust to satisfaction.

45

Quality

Trust

Satisfaction

Loyalty

Profitability

Figure 4: Nomological network of satisfaction in the context of retail banking Second, the relationship between quality and customer satisfaction is addressed. Quality of products and services is considered to be of major importance in retail banking, and has been shown to be related to customer satisfaction (e.g., Anderson et al., 1994; Cronin & Taylor, 1992; Zeithaml & Bitner, 1996). Like trust, quality is often conceived of as an antecedent of satisfaction but there seems to be more agreement among theorists with respect to quality; thus, in Figure 4 the arrow runs from quality to satisfaction. Third, the relationship between customer satisfaction and customer loyalty is addressed. The relationship between these constructs has been demonstrated in various studies (e.g., Caruana, 2002; Oliver, 1999), and customer satisfaction is often conceived of as a necessary although not a sufficient condition for customer loyalty (e.g. Gremler & Brown, 1996; Oliver, 1999). Therefore, in Figure 4 the arrow runs from satisfaction to loyalty. Fourth, the relationship between customer satisfaction and customer profitability is discussed. Longitudinal studies by Anderson et al. (1994), Anderson and Mittal (2004), and Gruca and Rego (2005) have investigated the relationship between customer satisfaction and future financial performance of companies. The results of these studies strengthen the expectation that customer satisfaction influences customer profitability. In Figure 4, the arrow pointing toward customer profitability shows the influence of customer satisfaction on customer profitability.

46

Conceptions of trust A review of the marketing literature yields two important conceptions of trust. The expectations-conception of trust focuses on a persons expectations with respect to an exchange partner, while the behavioural-conception focuses on a persons behavioural intentions with respect to an exchange partner (Singh & Sirdeshmukh, 2000). An example of an expectation is that a customer expects to be treated fair by the bank, and an example of a behavioural intention is the customers intention to continue the relationship with the bank or even expand the relationship, for example, by buying new products such as an insurance or a mortgage in addition to a bank account. The major difference between these conceptions is that the expectations-conception of trust does not include behavioural intentions in the domain of trust, while the behavioural-conception of trust does. Morgan and Hunt (1994) conceived of trust as existing when one party has confidence in an exchange partners reliability and integrity. This is an expectations-conception of trust, which is based upon Rotter (1967), who defined trust as a generalised expectancy held by an individual that the word of another individual or a group can be relied upon. Following Morgan and Hunt (1994), we defined trust as a persons confidence in the reliability and integrity of the company. This is a common definition of trust in the marketing literature (e.g., Verhoef, 2001, p.18), which we also adopt in this study (also, see Chapter 5). Singh and Sirdeshmukh (2000) conceived of trust as a continuum that is bounded on one side by a high level of trust and on the other side by a high level of distrust. The trust state and the distrust state differ with respect to the valence of the expectations held by the person. It may be noted that some authors suggested distinguishing between different dimensions of trust, such as competence-trust and benevolence-trust (e.g., Singh & Sirdeshmukh, 2000), or benevolence-trust and honesty-trust (e.g., Medlin & Quester, 2002). This stance implies that each dimension of trust is bounded by a high level of trust on the one side and by a high level of distrust on the other side. However, the dimensionality of trust is an empirical question, and studies establishing the dimensionality of trust are rare (Singh & Sirdeshmukh, 2000) so that definitive conclusions cannot be drawn. It may also be noted that empirical research demonstrated a relation between expectations and customer satisfaction. This relation is reflected in disconfirmation theory, in which expectations are conceived of as antecedents of customer satisfaction (e.g., Oliver, 1997, Tse & Wilton, 1988). Because trust concerns a persons expectations regarding an exchange partner (Morgan & Hunt, 1994), trust may also be conceived of as an antecedent of customer satisfaction (Singh & Sirdeshmukh, 2000).

47

In the financial services industry, trust is often conceived of as confidence in the reliability and integrity of a company. This is in agreement with the expectations-conception of trust, which is the common conception of trust in the marketing literature. Because persons are expected to prefer a company they trust to companies they do not trust, trust is considered an important success factor for companies in the financial services industry (e.g., Goedee, Reijnders, & Van Thiel, 2008).

Conceptions of quality There are two important conceptions of quality, which are objective quality and perceived quality (Oliver, 1997; p. 162-166). Objective quality pertains to the extent that a product, a service, or a process meets its technical specifications. It may be operationalised as the number of failures of a product, a service, or a process (e.g., Garvin, 1983; Kackar, 1989, p. 6; Woodall, 2001; because the number of failures is counter-indicative of quality, small numbers of failures reflect high quality and large numbers of failures reflect low quality). Perceived quality pertains to a persons judgements of quality of products or services. It may be operationalised on the basis of a questionnaire (e.g., Parasuraman, Berry, & Zeithaml, 1988; Cronin & Taylor, 1992). Perceived quality is similar to perceived performance of products or services, which is broadly conceived of as an antecedent of customer satisfaction (e.g., Oliver, 1997; Tse & Wilton, 1988; Yi, 1990). The meaning of quality is context-specific. This implies that the definition and the operationalisation of quality have to be adapted to the context and the purpose of a study. In the present study, we defined quality as a persons perceptions of the quality of attributes of products and services provided by the company (also, see Chapter 5). Thus, in this study quality was conceived of as perceived quality, which is in agreement with the conception of quality in many studies (e.g., Grnroos, 1990; Zeithaml, Parasuraman, & Berry, 1990). Furthermore, quality is established with respect to distinct attributes of products and services, which corresponds with the suggestion of theorists (e.g., Anderson & Mittal, 2000; Zeithaml et al, 1990; Zeithaml & Bitner, 1996) to distinguish different dimensions of quality. For example, Zeithaml and Bitner (1996, p. 85) distinguished service quality, product quality, and price quality as drivers of customer satisfaction. The combination of a customers positions on these dimensions was expected to drive customer satisfaction. Service quality has been studied extensively (e.g., Cronin & Taylor, 1992, 1994; Grnroos, 1984, 1990; Parasuraman, Zeithaml, & Berry, 1985, 1988, 1994; Zeithaml &
48

Bitner, 1996; Zeithaml, Parasuraman, & Berry, 1990). These studies yielded several measurement instruments for service quality, for example SERVQUAL (Parasuraman et al., 1988) and SERVPERF (Cronin & Taylor, 1992). One remark is in order concerning these instruments. SERVQUAL and SERVPERF were developed for the measurement of quality across industries, but they were not customised for the measurement of quality in particular industries, such as retail banking (e.g., Buttle, 1996; Coulthard, 2004; Newman, 2001; Oliver, 1997, p. 49). Therefore, the instruments may not cover all aspects of quality that are relevant within a particular industry, and for that reason business researchers are required either to customise these instruments to their research domain or to develop new measurement instruments. In the financial services industry, quality is broadly conceived of as a driver of customer satisfaction (e.g., Goedee et al., 2008; Terpstra & Van Gastel, 2004). This is in accordance with academic studies and theories (e.g., Caruana, 2002; Oliver, 1997; Van Montfort, Masurel, & Van Rijn, 2000; Tse & Wilton, 1988; Yi, 1991; Zeithaml & Bitner, 1996). A major part of in-company research in this industry is aimed at the assessment of distinct dimensions of quality, and their relations with satisfaction. For this purpose, quality is mostly operationalised on the basis of quality judgements by customers, regarding distinct attributes of products and services.

Conceptions of customer loyalty In present marketing theories, customer loyalty is conceived of as a psychological construct. Gremler and Brown (1996, 1999) have defined loyalty to a service provider as the degree to which a customer exhibits repeat purchasing behaviour from a service provider, possesses a positive attitudinal disposition towards the provider, and considers only this provider when a need for this service arises. This definition encloses three different aspects of loyalty, which are (a) behavioural loyalty, (b) attitudinal loyalty, and (c) cognitive loyalty. Gremler and Brown (1996) described the ultimately loyal customer as one who regularly uses a service provider, really likes the organisation and thinks very highly of it, and does not ever consider using another service provider for this service. This description of the loyal customer includes an implicit comparison of the service provider with other providers (also, see Dick & Basu, 1994). On the other end of this continuum is the ultimately non-loyal customer, who may be described as one who does not regularly use a service provider, does not really like the organisation, does not think highly of it, and considers using another service provider for

49

this service (Gremler & Brown, 1996). Gremler and Browns (1996, 1999) conception of loyalty to a service provider is similar to Olivers (1997, 1999) conception of customer loyalty in general. Most theorists agreed that customer loyalty encompasses psychological aspects as well as behavioural aspects (e.g., Dick & Basu, 1994; Gremler & Brown, 1996, 1999; Oliver 1997, 1999). Therefore, the construct has to be measured on the basis of a set of items that reflect both aspects. Empirical research using measurement instruments of customer loyalty that are composed of items reflecting psychological aspects and behavioural aspects of customer loyalty (e.g., Caruana, 2002; Gremler & Brown, 1999), yielded unidimensional measurements of customer loyalty. Customer loyalty has also been operationalised as an intention to recommend the company to family, friends, or colleagues (e.g., Reichheld, 2006). Because of three reasons, it is doubtful whether this was a proper operationalisation. First, the operationalisation did not agree with the definitions of customer loyalty provided by Oliver (1997, 1999) and Gremler and Brown (1996, 1999). Reichhelds (2006) operationalisation appears more consistent with conceptions of word-of-mouth, which is a concept that was not investigated in this study. Second, the operationalisation ignored the general principle that psychological constructs are best measured on the basis of multiple-item scales (e.g., Messick, 1989) Third, Terpstra (2006a) found indications that customers, who said they will recommend a particular company to friends and family, often said they will recommend competing companies. This seems to be inconsistent with customer loyalty. In the financial services industry, customer loyalty is considered important for commercial success of companies (e.g., Goedee et al., 2008). Customer loyalty is expected to affect the behaviour of customers and ultimately their profitability. Furthermore, business researchers in this domain broadly conceive of customer loyalty as a consequence of customer satisfaction. This agrees with results from academic research (e.g., Caruana, 2002; Gremler & Brown, 1996; Hennig-Thurau et al., 2002; Oliver, 1997, 1999).

Conceptions of customer profitability Customer profitability is of major importance for all commercial companies in service industries, including the financial services industry. Theorists suggested using customer profitability for marketing decision-making and accounting (e.g., Cooper & Kaplan, 1991; Mulhern, 1999; Niraj, Gupta, & Narasimhan, 2001). There are two important conceptions of customer profitability, which are gross customer profitability and net customer profitability.

50

Gross customer profitability refers to the gross financial contribution of a customer to the company in some period of time (e.g., Cooper & Kaplan, 1991, p. 469; Niraj et al., 2001). In the context of retail banking, the gross financial contribution consists of interest profits and provision profits (to be discussed in Chapter 5). Net customer profitability refers to the net financial contribution of a customer to a company in some period of time. The net financial contribution consists of the customers gross customer profitability in that period of time minus the companies costs allocated to the corresponding customer in the same period of time (e.g., Campbell & Frei, 2004; Cooper & Kaplan, 1991, p. 469; Mulhern, 1999; Niraj et al., 2001; Pfeifer, Haskins, & Conroy, 2005). Customer profitability is the resultant of customer behaviour, such as the acquisition and use of products and services from the focal company. Because customers differ with respect to their behaviour, they also differ with respect to customer profitability. Furthermore, because a customers behaviour changes over time, customer profitability also changes over time. For example, a customer who increases his or her business with the company will become more profitable to the company than he or she was before. In the financial services industry, customer profitability is the resultant of financial behaviour. Because a customers financial behaviour is related to his or her financial means, a customers profitability is also related to his or her financial means. Obviously, a customer with large financial means may achieve higher customer profitability than a customer with smaller financial means. The absence of data with respect to customers means, which in this kind of research is more the rule than the exception, may complicate research into the connection between customer satisfaction and customer profitability in the financial services industry. The operationalisation of customer profitability is context-dependent. For example, the period of time may be a day, a month, a quarter of a year, or a year (e.g., Campbell & Frei, 2004). For example, due to the high purchase frequency, a two-week period may be sufficient to reliably record customers purchase behaviour in a supermarket (a two-week period is expected to cancel out highs and lows), but due to the much lower purchase frequency, at least a one-year period may be required to reliably record customers purchase behaviour with a retail bank. Therefore, a two-week period may suffice for the operationalisation of customer profitability for supermarkets, while a one year period is required for the operationalisation of customer profitability in retail banking. We expected that customer satisfaction positively influenced a customers gross financial contribution, but we held no expectation about the influence of customer satisfaction
51

on the costs associated with a customer. Therefore, we chose the gross customer profitability conception of customer profitability for the present study. In agreement with this conception of customer profitability, we defined customer profitability as the gross financial contribution of a customer to the company in some period of time.

The influence of customer satisfaction on customer profitability Customer satisfaction is broadly expected to influence customer profitability and company profitability (e.g., Anderson et al. 1994; Anderson et al. 2004; Anderson & Mittal, 2000; Fornell, 1992; Gustafsson et al., 2005; Homburg et al., 2005; Mittal & Kamakura, 2001; Oliver, 1997; Rust & Zahorik, 1993). This is an important reason for the interest in customer satisfaction in various industries, including the financial services industry. If customer satisfaction (denoted by CS) influences customer profitability (denoted by CP), there must be a relation between customer satisfaction at time t = 0 and customer profitability at time t > 0 (e.g., Ittner & Larcker, 1998). Then a model for the relation between customer satisfaction (denoted CSt=0), other independent variables (denoted Xi), and future CP (denoted CPt>0) is:
CPt >0 = + CS t =0 + ... + i X i + .

The model was based on Ittner and Larcker (1998). The exact specification of the model is context-dependent (see Chapter 6). Henceforth, customer profitability at time t > 0 is labeled future customer profitability, because it is the customer profitability at a point in time after the measurement of customer satisfaction. Current customer profitability is measured at time t = 0 and customer satisfaction is measured at time t = 0. It is plausible that the effect size of customer satisfaction on future customer profitability depends on characteristics of customers and markets, such as involvement of customers and the availability of alternatives in the market. Fornell (1992) hypothesised that customer satisfaction affects the commercial success of companies that operate in mature and competitive markets. Therefore, we expect that in retail banking industries in mature markets, customer satisfaction has a significant positive effect on future customer profitability. Various studies (Anderson et al., 1994; Anderson & Mittal, 2000; Gruca & Rego, 2005; Ittner & Larcker, 1998) demonstrated a relationship between customer satisfaction and

52

company profitability after one year. Ittner & Larcker (1998) also demonstrated a relationship between customer satisfaction and customer profitability after one year. In-company research (Terpstra, 2005, 2008) demonstrated a relationship between customer satisfaction and customer profitability after 15 months. Therefore, we expect that in retail banking the influence of customer satisfaction on customer profitability is manifest after one year. Former studies in the financial services industry (e.g., Campbell & Frei, 2004; Terpstra, 2005, 2006b) demonstrated that current customer profitability is the major determinant of future customer profitability. This relationship may be due to, for example, inertia of customers, and the relationship between current customer profitability and the financial means of customers. For these reasons, we consider current customer profitability an indispensable variable in the model of the relation between customer satisfaction and future customer profitability in retail banking.

Measures of satisfaction

Many measures of satisfaction have been reported in the marketing literature (e.g., Hausknecht, 1990; Peterson & Wilson, 1992; Westbrook & Oliver, 1981; Wirtz & Lee, 2003). Hausknecht (1990) listed 34 measures (i.e., operationalisations) of satisfaction, which were used in satisfaction research. The list included behavioural measures (i.e., registrations of behaviour, such as number of complaints about the focal product) and self-report measures (i.e., survey items, such as rating scales). The self-report measures differed with respect to the number of items included (varying from one to six items), the format of items (verbal items, graphical items, and items reflecting observations of behaviours such as the number of complaints), the wording of the items (some items were phrased in the form of a question and others were phrased in the form of a statement) and the format of response categories (varying from two to thirteen response categories). Hausknecht (1990) noted that the validity of measurements of satisfaction was rarely assessed; also see Giese and Cote (2000) and Peterson and Wilson (1992). It is remarkable that different measures of satisfaction, which were used in different studies, yielded similar distributions of satisfaction ratings. Peterson and Wilson (1992) noted that Virtually all self-reports of customer satisfaction possess a distribution in which a majority of the responses indicate that customers are satisfied and the distribution itself is negatively skewed. They also demonstrated that method-related factors, such as question format, question context, questionnaire administration, and measurement timing, affected the

53

average satisfaction ratings and the skewness of distributions of satisfaction ratings. They concluded that it is not clear what customer satisfaction ratings reflect, that average satisfaction ratings are not very informative without valid norms for average customer satisfaction, and that more effort is needed to improve the measurement of customer satisfaction. In this section, different measures of satisfaction are discussed in association with the corresponding definitions of satisfaction as discussed in Section 2 (Table 2). The definitions of satisfaction and the corresponding measures are listed in Table 3. Tse and Wilton (1988) used a single-item measure of satisfaction, which was a 5-point bipolar item with response categories ranging from very dissatisfied to very satisfied. The item reads: Considering everything, how satisfied are you with the [product]?. This bipolar item is a rather common measure of satisfaction, which was also used by others who, however, used a 7-point rating scale instead of a 5-point rating scale (e.g., Westbrook & Oliver, 1991, Wirtz & Lee, 2003). Furthermore, the item was used in various multiple-item measures of satisfaction (e.g., Wirtz & Lee, 2003). Tse and Wilton (1988) demonstrated that their single-item measure correlated with disconfirmation and perceived performance. Nevertheless, the measure has three drawbacks. First, the definition of satisfaction by Tse and Wilton (1988) has a level of abstractness that does not automatically lead to this specific item. Second, it is a single-item measure of satisfaction, whereas most theorists suggested the use of multiple-item measures for the measurement of psychological constructs such as satisfaction because multiple-item measures better capture the meaning of the construct (e.g., Churchill, 1979; Jacoby, 1976; Messick, 1989; Yi, 1990). Third, Westbrook and Oliver (1991) demonstrated that their 7-point version of the item performed worse than other measures of satisfaction that were used in the same study. These three drawbacks call into question the validity of the measurement using a single item. Bloemer (1993) proposed a two-step approach to measure satisfaction and dissatisfaction. First, a person was asked whether he or she was satisfied or dissatisfied with the focal object. Second, the person was asked how satisfied (or how dissatisfied) he or she was in terms of, for example, a percentage ranging from 0 to 100. Bloemers (1993) measure correlated with commitment and repeat-purchasing behaviour. However, three comments are in order. First, the measure lacks a thorough explanation. Bloemer (1993, pp. 79, 128) conceived of satisfaction and dissatisfaction as two different dimensions, but this does not

54

Table 3: Measures of Customer Satisfaction and/or Consumer Satisfaction


Measure of satisfaction Satisfaction [ with a focal object (f.o.)] was measured on the basis of one 5point bipolar item, with response categories ranging from very dissatisfied to very satisfied. Satisfaction [with a f.o.] was measured on the basis of a 2-step approach: Are you satisfied or dissatisfied with the brand? How much are you satisfied (dissatisfied) in terms of a percentage?

Author

Definition of satisfaction

Tse &

The consumers response to the evaluation of the perceived

Wilton

discrepancy between prior expectations (or some other norm of

(1988)

performance) and the actual performance of the product as

perceived after its consumption.

Bloemer

The outcome of the subjective evaluation that the chosen

(1993)

alternative (the brand) meets or exceeds the expectations of the

person.

Howard &

The buyers cognitive state of being adequately or inadequately No measure was proposed. Satisfaction [with a f.o.] was measured on the basis of 3 questions: one 10-point bipolar item on global satisfaction one 10-point bipolar item on disconfirmation of expectations one 10-point bipolar item on distance to the ideal. A multiple-item measure of satisfaction [with a f.o.] was proposed: seven 5-point Likert items that are indicative of satisfaction five 5-point Likert items that are counter-indicative of satisfaction. A framework was proposed for the development of context-specific definitions of satisfaction. As a consequence, no measure was proposed that

Sheth (1969) rewarded for the sacrifices she or he has undergone.

55

Fornell

An overall post-purchase evaluation.

(1992)

Oliver

The judgement that a product or a service feature, or the

(1997)

product or service itself, provided or is providing a pleasurable

level of consumption-related fulfilment, including levels of

under- or overfulfilment.

Giese &

(a) an affective response of varying intensity, (b) directed

Cote (2000)

towards focal aspects of the acquisition and/or consumption op

products and services, (c) determined at the time of purchase or is generally applicable.

temporal points during consumption.

explain the use of a two-step approach to measure satisfaction and dissatisfaction. One may argue that if satisfaction and dissatisfaction are conceived of as different dimensions, it is appropriate to separately measure the level of satisfaction as well as the level of dissatisfaction of each customer. Second, the assessment of the level of satisfaction is based upon only one item (Bloemer, 1993, p.145), but most theorists advocate multiple-item scales for the measurement of psychological constructs such as satisfaction (e.g., Churchill, 1979; Jacoby, 1976; Messick, 1989; Yi, 1990). Third, a study by Westbrook and Oliver (1991), who used one 11-point item on satisfaction and one 11-point item on dissatisfaction, indicated that dissatisfaction and satisfaction are opposites on a bipolar dimension. This is in contrast with Bloemers (1993) stance. Howard and Sheth (1969) did not discuss the measurement of satisfaction, and did not propose a measure of satisfaction. Measures of satisfaction that are associated with the definition of satisfaction as a cognition (e.g., Howard & Sheth, 1969) are summated performance ratings (Oliver, 1997, p. 318). An example is the measurement of customer satisfaction by means of the sum of a customers ratings of features of products and services. We subscribe to Olivers (1997, pp. 33-34, 318) criticism that (a) it is unclear which features of products and services may be used for the measurement of customer satisfaction and how these features may be weighted, (b) these measurements do not match the theoretical meaning of satisfaction, which incorporates the affective content of satisfaction, and (c) these measurements are useless for research in which the influence of features of products and services on satisfaction are investigated. Fornell (1992) proposed a measure of summary satisfaction (or cumulative satisfaction) that was composed of three 10-point bipolar items. The items concerned (a) global satisfaction of the customer with the product, service, or company, (b) disconfirmation of expectations of the customer regarding the product, service, or company, and (c) the distance from the customers hypothetical ideal product, service, or company. The measure was incorporated in the Swedish Customer Satisfaction Index, the Norwegian Customer Satisfaction Index, and the American Customer Satisfaction Index (e.g., Fornell, 1992; Fornell, Johnson, Anderson, Cha, & Bryant, 1996; Johnson, Gustafsson, Andreassen, Lervik, & Cha, 2001), and it was used in various empirical studies (e.g., Anderson, Fornell, & Lehmann, 1994; Anderson, Fornell, & Mazvancheryl, 2004; Anderson & Mittal, 2000; Gruca & Rego, 2005; Ittner & Larcker, 1998). This strengthened the confidence in the quality of the measure. However, the measure lacks correspondence with Fornells (1992) definition of satisfaction, meaning that it is not obvious why the abstract definition of satisfaction resulted
56

in this particular measure of satisfaction and not in another measure. For example, Verhoef (2001, p. 18, p. 57) developed a measure of satisfaction on the basis of the definition by Fornell (1992; also Anderson, Fornell, & Lehmann, 1994) that was much different from Fornells (1992) measure. Verhoefs (2001) measure of satisfaction was the total score (or the factor score derived from confirmatory factor analysis) on seven items regarding satisfaction with the company. An example of such item was How satisfied are you with the personal attention of XYZ?, which had five response categories, ranging from very dissatisfied to very satisfied (the seven items were in the same format). Because Fornells (1992) definition does not provide many clues for constructing measures, it is difficult to judge whether Fornells (1992) and Verhoefs (2001) measures correspond with this definition. Oliver (1997, p. 343) proposed a measure of summary satisfaction that incorporates different phenomena together defining the meaning of satisfaction. First, Oliver noted that satisfaction is best measured using a multiple-item scale. Second, he noted that the measure should contain an anchor item, which is an item formulated in terms of general satisfaction with the product or the service provided. Third, Oliver listed several aspects or antecedents of satisfaction that may be incorporated in a measure of satisfaction, such as performance evaluations, expectations, disconfirmation, need fulfilment, dissonance, and affects. Fourth, he included several items that are counter-indicative of satisfaction and, consequently, indicative of dissatisfaction. This is consistent with the conception of dissatisfaction as the opposite of satisfaction on a bipolar dimension, and with general psychometric principles regarding the measurement of psychological constructs (e.g., Oort, 1996). The inclusion of items on various phenomena in Olivers (1997) measure of summary satisfaction does not imply that the author conceived of summary satisfaction as a multidimensional construct. The dimensionality of a construct is ultimately an empirical question, and empirical research (e.g., Mano & Oliver, 1993; Oliver & Swan, 1989; Oliver, 1993; Wirtz & Lee, 2003) has supported the conception of summary satisfaction as a unidimensional construct. Olivers (1997) measure of summary satisfaction was composed of twelve 5-point Likert items. Seven items were indicative of satisfaction and five items were counterindicative of satisfaction. The measure was accommodated to the measurement of satisfaction with ones car. An earlier version of the measure was composed of six 5-point Likert items (Oliver, 1980), and was accommodated to the measurement of satisfaction with a flu vaccination program.

57

Oliver (1997) argued that the optimal composition of a measure depends on (a) the research topic and (b) the research purpose. For example, if a particular phenomenon such as disconfirmation has to be related to satisfaction, it should not be incorporated in the satisfaction measure (Oliver, 1997, p. 343). This is in accordance with the psychometric principle regarding divergent validity (Campbell & Fiske, 1959). Giese and Cote (2000) argued that a measure of satisfaction should be context-specific and, as a result, they did not propose a measure of satisfaction that is generally applicable. The absence of a general measure is consistent with the view that satisfaction may have different meanings in different contexts, and contrasts Fornells (1992) position that resulted in a measure that was applicable across a variety of industries. Three remarks are in order with respect to the measurement instruments of satisfaction listed in Table 3. First, the correspondence between a particular definition of satisfaction on the one hand and a particular measurement instrument of satisfaction on the other hand is often ambiguous. Thus, it is not obvious why a particular definition of the construct resulted in a particular measurement instrument for satisfaction, and not in another one. This lack of clarity may be due to the generality of most definitions of satisfaction, which did not provide sufficiently many clues for the development of a measurement instrument of satisfaction. For example, the definition of satisfaction by Fornell (1992; see also Anderson, Fornell, & Lehmann, 1994) was used as a justification for two very different measurement instruments of customer satisfaction. Second, construct validity has been underexposed. Satisfaction studies yielded evidence of convergent, divergent, and nomological validity of measurements of satisfaction (e.g., Oliver, 1980; Oliver & Burke, 1999; Tse & Wilton, 1988; Verhoef, 2001; Westbrook & Oliver, 1991; Wirtz & Lee, 2003), but failed to address the main threats to construct validity, which are construct underrepresentation and construct-irrelevant variance (Messick, 1989). For example, except for Olivers (1997) measure it was insufficiently investigated whether the measures sufficiently represented the construct, and none of the other studies investigated contamination of measurements with method-related irrelevant variance. Third, the usefulness of satisfaction research for the further development of satisfaction theory may be enhanced by the further improvement of measurement instruments of satisfaction. Because the meaning of satisfaction is context-specific, such measurement instruments may be developed on the basis of context-specific definitions of satisfaction. This implies the development of different measurement instruments for satisfaction for different research domains (also, see Giese & Cote, 2000).
58

Discussion

Satisfaction may be considered a response to disconfirmation; thus, the process that evokes the satisfaction response is at the centre of attention. The definitions associated with this conception are process definitions. They describe the process that evokes the satisfaction response, but fail to explain what the satisfaction response is (Oliver, 1997, pp. 12-13). Alternatively, satisfaction may be considered a valenced response to consumption. Here, the content of the satisfaction response is central. Because the meaning of satisfaction concerns the content of the satisfaction response (for a more general discussion, see Sartori, 1984; Schouwstra, 2000), we consider the latter conception more useful for defining satisfaction than the former conception. The prototypical definitions associated with the conception of satisfaction as a valenced response to consumption differ with respect to the specification of the properties of the satisfaction response and the level of detail of the explanation of the satisfaction response. First, Howard and Sheth (1969) defined satisfaction as a cognitive response to consumption, whereas Giese and Cote (2000) defined satisfaction as an affective response to consumption. Second, Fornell (1992) provided a generic definition of satisfaction, whereas Oliver (1997) provided a detailed definition of satisfaction. As was noted in Section 6, Fornells (1992) definition of satisfaction was too generic for the development of a measurement instrument of satisfaction. Following Giese and Cote (2000), we think that a sufficiently detailed definition of satisfaction requires the specification and the explanation of (a) the type of satisfaction response, (b) the focal object of the satisfaction response, and (c) the timing of the satisfaction response. There is no consensus definition of satisfaction, which probably is due to the contextspecific nature of satisfaction (Giese & Cote, 2000). Therefore, we subscribe to Giese and Cotes (2000) recommendation to develop context-specific definitions and corresponding measurement instruments of satisfaction. Because the meaning of satisfaction is contextdependent, we do not agree with Giese and Cote (2000) that satisfaction is limited to affective responses to consumption experiences. Oliver (1997) demonstrated that satisfaction can have cognitive content and affective content, because it can manifest in performance evaluations, expectations, disconfirmation, regret, and emotions. Whether the cognitive content or the affective content prevails, depends on the research domain and on characteristics of the person (Oliver, 1997, pp. 316-318).

59

Four additional remarks are in order to explain satisfaction in the context of retail banking. First, satisfaction pertains to the satisfaction of the customers of the bank. For this reason, we consider customer satisfaction the best term for satisfaction in the context of retail banking. Second, consumption of products and services from a retail bank is an ongoing process. Persons remain customer of a bank for a long period of time, in which they make use of products and services from the company, and maintain some contact with the company. In this context, customer satisfaction results from the accumulation of encounters with the company. Third, because customer satisfaction may result from unappraised affects, appraised affects, unappraised cognitions, and appraised cognitions, the construct includes both manifest customer satisfaction and latent customer satisfaction. Fourth, because a customers satisfaction with a bank may range from very satisfied to very dissatisfied, customer satisfaction is the opposite of customer dissatisfaction on a bipolar dimension. In this study, each of these four remarks was taken into account.

Explicit definition of customer satisfaction with a retail bank Giese and Cote (2000) rightly argued that the meaning of satisfaction is context-specific, and that the definition and measurement of satisfaction also need to be context-specific. It is not possible to develop an explicit definition of satisfaction that grasps the meaning of satisfaction in all contexts. It is more fruitful to analyse the meaning of customer satisfaction within a particular context, and then develop a context-specific definition. This study pertains to customer satisfaction with a retail bank, and it is limited to summary satisfaction. In this context, customer satisfaction (a) (b) (c) (d) (e) (f) (g) (h) is limited to the satisfaction of customers of the company; pertains to the company as a whole, and not to single products or services; results from the accumulation of encounters of customers with the company; results from the psychological processing of consumption outcomes; covers customers affects and cognitions reflecting a value judgement; may result from appraised affects, appraised cognitions, unappraised affects, and unappraised cognitions; becomes manifest in customers performance evaluations, expectations, disconfirmation, emotions, and regret; and is the opposite of customer dissatisfaction on a bipolar dimension.

60

These eight characteristics explain the content of customer satisfaction with a retail bank. We summarise them accordingly: customer satisfaction with a retail bank is the valenced response of the customer, directed towards the retail bank, and evoked by the customers experiences with the retail bank throughout time. This is the explicit definition of customer satisfaction with the retail bank. It may be noted that the definition covers the three components, which Giese and Cote (2000) required from a definition of satisfaction. First, satisfaction is conceived of as the customers valenced response. Second, the focus of the customers response is the retail bank. Third, the timing of the response is during or after the customers experiences with the retail bank. Because evaluations range from positive to negative, dissatisfaction is simultaneously defined as the opposite of satisfaction on a bipolar dimension. Implicit definition of customer satisfaction with a retail bank Whereas the explicit definition addresses the construct, the implicit definition addresses the constructs relations to other constructs and variables (Schouwstra, 2000, p. 61). Therefore, the implicit definition of customer satisfaction is founded on the nomological network of the construct, which was discussed in section 5 of this chapter. Customer satisfaction with a retail bank is implicitly defined in terms of its relations with trust, quality, and customer loyalty, and its influence on customer profitability. As a consequence, it is expected that overall satisfaction with a retail bank is positively related to (a) trust in the company, (b) quality perceptions regarding the products and services provided by the company, (c) loyalty to the company, and (d) future customer profitability.

8 1.

Conclusions The meaning of customer satisfaction differs between and within contexts. For this reason (a) it cannot be sharply defined but it needs to be explained by means of examples, and (b) the examples are context-dependent.

2.

Dissatisfaction may be conceived of as the opposite of satisfaction on a bipolar dimension. This means that satisfaction/dissatisfaction is expected to constitute a unidimensional construct, and that customers are not both satisfied and dissatisfied with the same phenomenon at one point in time.

3.

Customer satisfaction with a retail bank is explicitly defined as the valenced response of the customer that is directed towards the bank and that is evoked by the whole of
61

consumption experiences with the bank. This definition encloses various cases that are mutually related by family resemblances. 4. Customer satisfaction with a retail bank is implicitly defined on the basis of its connections with other psychological constructs and with behaviour. In the domain of retail banking, the relations of satisfaction with (a) trust, (b) quality, (c) customer loyalty, and (d) future customer profitability are considered most important. 5. Many measures of satisfaction have been reported in the marketing literature, and different measures of satisfaction are associated with different definitions of the construct. However, evidence of construct validity of most measures of satisfaction is absent. This limits the usefulness of satisfaction research for the development of satisfaction theory. 6. The usefulness of satisfaction research for the development of satisfaction theory may be enhanced by further improvement of the measures of satisfaction. The improvement of measures of satisfaction entails (a) explication of the context-specific meaning of satisfaction, (b) explication of correspondence between the definition and the measure of satisfaction, and (c) assessment of validity of measurements of satisfaction. In the next chapters, we will develop a context-specific measurement instrument of satisfaction and validate the measurements obtained with this instrument.

62

63

64

Chapter 4 Deductive design for test development and construct validation

Introduction

Psychological properties can be measured by means of psychological tests (Chapter 1, Section 4). A psychological test is an instrument which elicits behaviour that is representative of the property of interest and which can be used to measure the extent to which a person possesses the property. A test may consist of a well-chosen set of items that are administered in a survey. On the basis of the responses a person provides to these items, his or her position on the scale for the property is inferred. This chapter addresses the design of the empirical study. The purpose of the study was to develop a measurement instrument for customer satisfaction with retail banks, and to test the relations of customer satisfaction with constructs and variables in the corresponding nomological network. For this purpose, we applied the deductive design (Schouwstra, 2000) for test development and construct validation.

The deductive design

The deductive design (Schouwstra, 2000) is a methodology for test development and construct validation for typical-behaviour properties such as customer satisfaction. The methodology departs from a theoretical analysis of the construct of interest. In this respect, it is consistent with the deductive approach to test development (Oosterveld, 1996). Following Messick (1989, pp. 13, 34), Schouwstra (2000, p. 57) defined construct validity as an evaluative judgement of the trustworthiness of a test-score interpretation in terms of a construct. Messick (1989, pp. 34-35, 1995) addressed two general threats to construct validity, which are (a) construct underrepresentation, and (b) measurement of irrelevant variance. Construct underrepresentation occurs when only a part of the construct is measured. For example, a test measures only a part of the construct of customer satisfaction with a focal object when it only includes items that reflect cognitions about the object but no items that reflect affects. Measurement of irrelevant variance occurs when not only the construct is measured, but also other psychological properties, attributes related to group

65

membership, or response tendencies. For example, a test for customer satisfaction measures more than just the intended construct when it also includes items that require a high level of verbal intelligence to be comprehended. Then, the test scores also depend on verbal intelligence, and the variation in test scores that is caused by variation in verbal intelligence is conceived of as irrelevant variance. Also, a test for customer satisfaction may be administered to one part of the sample by telephone and to another part by the Internet, and as a result of these different administration modes different response categories may be used. Now, test scores partly depend on administration mode, and the variation in test scores that is caused by differences in the administration procedure is conceived of as irrelevant variance. Both construct underrepresentation and irrelevant variance refute the interpretation of test scores in terms of a reflection of the construct and nothing else (Messick, 1989; Schouwstra, 2000). Hence, construct validation concerns the assessment of construct representation and absence of irrelevant variance. Following Anastasi (1986), Schouwstra (2000) argued that construct validation should start at the outset of test development. This stance is reflected in the deductive design, which demands two lines of evidence for construct validation (Table 1; from Schouwstra, 2000, p. 60). The first line of evidence should be made of rationales underlying the test-score interpretations, and the second line of evidence should be made of empirical evidence that the test score reflects the complete construct and nothing else. Each line of evidence should address construct representation and absence of irrelevant variance in test scores.

Table 1: Outline of Construct Validation Within the Deductive Design (Schouwstra, 2000, p. 60)
Scientific arguments Rationales a. Formulation b. Translation c. Modelling Empirical evidence Of what construct of interest is Of construct of interest into test content How test score reflects construct That test score reflects whole of construct And what not And nothing else And nothing else And nothing else Construct representation Irrelevant variance

The rationales consist of an explanation of how the test-score interpretations are derived from the theory about the construct (Schouwstra, 2000). First, this explanation requires formulating what the construct of interest is and what it is not, and to which other constructs it is related. The construct has to be defined explicitly by means of the specification of the
66

aspects and attributes to which it refers, and implicitly by the specification of related concepts that constitute the nomological network. Second, the way in which the construct definition is translated into test content needs to be specified. This specification involves the formulation of (a) guidelines concerning the formulation of items that reflect the construct and nothing else, (b) guidelines for acts that control for possible response tendencies, and (c) the items, which constitute the operationalisation of the construct. Third, the measurement model that is expected to fit the empirical data needs to be specified. This specification includes the explanation of the relationship between the items and the test score. The empirical evidence consists of results from empirical research into the test-score interpretations. Following Cronbach (1988, 1989), Schouwstra (2000, pp. 1-3) noted that a strong version of construct validity research involves the testing of hypotheses about what a test score measures and what it does not measure. These hypotheses refer to (a) the explicit construct representation, (b) the implicit construct representation, (c) concept-related irrelevant variance, and (d) method-related irrelevant variance (Schouwstra, 2000, pp. 68-71). The explicit construct representation of test scores encompasses content validity, convergent validity, and divergent validity, and is assessed on the basis of tests of corresponding hypotheses. The implicit construct representation pertains to the nomological validity of test scores, and is assessed on the basis of tests of hypotheses regarding the relationship of test scores with measures of other concepts in the nomological network. Method-related irrelevant variance pertains to variance caused by phenomena that are not related to the construct of interest, such as response tendencies and characteristics of the research method. Concept-related irrelevant variance pertains to variance caused by phenomena that are related to the construct of interest, such as the concepts in the nomological network and properties related to group membership. Both method-related irrelevant variance and concept-related irrelevant variance are investigated on the basis of tests of hypotheses regarding the contamination of test scores by other properties and variables. The methodology to test hypotheses regarding the contamination of test scores is addressed in the next section. Both lines of evidence need to be integrated into an evaluative judgement of the validity of the test-score interpretations (Schouwstra, 2000, p. 71). This judgement reflects the interpretation whether and to what extent the evidence supports the interpretation of test scores in terms of the construct of interest, and nothing else. The more comprehensive the argumentation for the test-score interpretation, the more convincing the support for construct

67

validity. However, the support is never conclusive. First, construct validation is an unending process that includes the judgement of evidence gathered in the processes of test development and test use (Anastasi, 1986, 1988; Cronbach, 1971, p. 452; Messick, 1989, p. 13). Second, construct representation is to some extent arbitrary, because constructs do not have sharp boundaries (e.g., Wittgenstein, 1953, 1958). Third, it is not possible to exclude all irrelevant variance in the context of psychological measurement. For example, most psychological tests require linguistic skills of participants, and the extent to which a participant possesses these skills can influence his or her response behaviour (e.g., Schouwstra, 2000, p. 63). When a test is used in a sample containing participants with different levels of education, test scores are readily biased with respect to varying linguistic skills of participants. This example implies that construct validity is almost always imperfect. Summarising, the deductive design is a methodology for test development and test-score validation. First, it is directed towards the development of tests that encompass all the important aspects of the construct of interest. Second, the deductive design is directed towards the minimisation of test-score variance that is irrelevant to the construct of interest (Schouwstra, 2000, pp. 81-83). The methodology encompasses the explication of rationales and the collection of empirical evidence with respect to the interpretation of test scores.

The theory of violators

The theory of violators (Oort, 1996) addresses a methodology to test hypotheses with respect to the contamination of test scores by other variables, such as other traits, attributes related to group membership, or response styles. In the theory of violators, irrelevant variance is conceived of as variance caused by phenomena that violate the unidimensionality of the scale (Oort, 1996). The theory of violators is based upon the following definitions of item bias and unidimensionality (Oort, 1996, p. 7): A scale consisting of a set of items is unidimensional if and only if each of the items is unbiased with respect to every potential violator that might be relevant in whatever context the test might be used, and An item I is unbiased with respect to a potential violator V and given trait T if and only if, for all values i and v and t: P(I=i | T=t, V=v) = P(I=i | T=t).

68

The theory of violators requires local independence between item and violator, meaning that the probability of endorsement with item I, given trait-score t, is independent of violatorscore v. Marginal independence between item and violator is not required, meaning that it is not required that the probability of endorsement to item I is independent of violator-score v. Let rest-score R be the total score of a person on the set of items measuring trait T minus the score on an item I. Then, item bias may be investigated by means of the partial correlation of an item I and violator V while controlling for the rest-score R. Oort (1996) suggested restricted factor analysis (to be discussed in Chapter 6) for testing the hypothesis that test scores are not contaminated by a violator V. The theory of violators provides a useful methodology for empirical research into the contaminating effects of violators on test scores. Nevertheless, three comments are in order. First, research into the unidimensionality of a scale cannot exclude all irrelevant variance that may threaten the interpretation of test scores in terms of the construct of interest. For example, characteristics of the measurement instrument (e.g., the method of administration and the question format; Bradburn, 1983) may affect the magnitude of test scores without affecting the unidimensionality of the scale. Second, multidimensionality does not necessarily imply that the measurement is invalid. For example, a particular construct may encompass different attributes (e.g., intelligence may encompass verbal intelligence and spatial intelligence; Gardner, 1993), and the measurement of the construct may turn out multidimensional instead of unidimensional. Third, perfect unidimensionality seems impossible because it is unlikely that the items of a scale would be unbiased for all possible violators (e.g., Oort, 1996, pp. 1819). This is in agreement with the notion that it is impossible to eliminate all irrelevant variance in psychological measurement, and it endorses the notion that a judgement of construct validity has to be qualitative and gradual by nature (e.g., Cronbach & Meehl, 1955).

Purpose of the study and conditions for test development

The purpose of this study was to develop a measurement instrument for customer satisfaction with retail banks, and to validate theory regarding the meaning of customer satisfaction in the domain of retail banking. Given the context of this study, the measurement instrument had to be accommodated to the meaning of customer satisfaction with a retail bank, and it had to be used in empirical research in the corresponding domain.

69

The population of interest in this study consisted of the mature customers of a Dutch retail bank, in this study denoted as BANK. The measurement instrument had to be applied in survey research to a sample from this population, and therefore it was administered in Dutch. Furthermore, the instrument had to comply with requirements regarding the composition of questions and questionnaires used in surveys (e.g., Belson, 1986; Dillman, Tortora, & Bowker, 1998; Sheatsley, 1983; Sudman & Bradburn, 1982).

Test development

Test development is the development of the measurement instrument. In Chapter 3 (Section 7), customer satisfaction with a retail bank was explained on the basis of eight characteristics, which were summarised in the explicit definition: customer satisfaction with a retail bank is the valenced response of the customer, directed towards the retail bank, and evoked by the customers experiences with the retail bank throughout time. Furthermore, customer satisfaction was defined implicitly by its connections with trust, quality, customer loyalty, and customer profitability. The latter concepts are part of the nomological network of customer satisfaction in the domain of retail banking, and they delineate the construct to a large extent. The explicit definition of customer satisfaction with a retail bank covers the three components Giese and Cote (2000) required of a definition of satisfaction, which are response type, timing of the response, and focus of the response (also, see Chapter 3). The explicit definition was used here to formulate a facet design (Table 2) with three facets representing the three components (the response focus facet was not reflected in Table 2, because it had one element). The facet response type had two elements (i.e., cognitive response and affective response), the facet time frame had two elements (i.e., present and past), and the facet response focus had one element (i.e., the bank). Thus, the facet design had four structuples.

Table 2: The Facet Design for Customer Satisfaction with a Retail Bank
Response type / Time frame Cognitive Affective Present Structuple 1 Structuple 3 Past Structuple 2 Structuple 4

The purpose of the design was to facilitate the formulation of an item set that yields a complete construct representation. Following Oliver (1997, p. 343), we chose to formulate a

70

comprehensive set of items of the Likert type (Likert, 1932). This type of items allows for the construction of items that are (a) expected to be monotonically related to the construct of interest, and (b) either indicative or counter-indicative of the construct of interest (e.g., Oort, 1996). The following specifications guided the formulation of the items: 1. Each structuple is represented by two items. One item should be indicative and the other counter-indicative of the construct (e.g., Fabrigar, Krosnick, & MacDougall, 2005; Likert, 1932), in order to represent both poles of the satisfaction/dissatisfaction continuum. In order to prevent the questionnaire from becoming too long and ask too much of the participants, the number of items for each structuple was limited to two. 2. Each item should be monotonically related to customer satisfaction. In the context of this study, this means that the probability of choosing a particular answer category or a higher answer category in response to a positively worded item, should be a monotonically nondecreasing function of customer satisfaction (i.e., a function that decreases nowhere along the scale. Instead, the function either increases monotonically, remains constant, or increases across some intervals of the scale and remains constant across other intervals; henceforth, to keep things simple we call this monotonicity; see Sijtsma & Molenaar, 2002, pp. 20, 119). Negatively worded items are re-coded prior to data analysis, and monotonicity should hold as well. 3. Each item should reflect general satisfaction with the company. For this reason, the subject in each item should be the bank, and not a particular transaction, product, or product feature. 4. The wording of the items should be kept simple and unambiguous (e.g., Belson, 1986). This means, for example, that the items should be kept short and easy to understand, and that negations should be avoided. 5. The item set should contain one anchor item (Oliver, 1997, p. 343), which is an item that is formulated in terms of satisfaction with the company (i.e. an item such as I am satisfied with BANK). 6. None of the items should be phrased in terms of related constructs such as trust, quality, and customer loyalty (Chapter 3). This means that items should not be phrased in terms of (a) preference for the company over other companies, (b) expectations regarding competence and integrity of the company, and (c) attributes of products and services provided by the company.

71

On the basis of these specifications, a set of nine items was formulated (Table 3). The set contained one anchor item and eight items representing the four structuples (Table 2). All items were of the Likert type with five ordered response categories that ranged from totally agree to totally disagree. We chose five response categories in order to also include a neutral option. Because satisfaction/dissatisfaction was conceived of as a continuum (Chapter 3), it was expected that a unidimensional measurement model would fit the empirical data and that all items were monotonically related to the satisfaction/dissatisfaction dimension.

Table 3: Items of Customer Satisfaction with BANK


Item I am satisfied with BANK BANK meets all my requirements for a bank There are good reasons to leave BANK (*) BANK has met my expectations Last year I had some problems with BANK (*) At BANK I feel at home I have mixed feelings about BANK (*) Last year I had a pleasant relationship with BANK I have regretted my choice for BANK (*) Structuple None 1 1 2 2 3 3 4 4 Aspect General satisfaction Need fulfilment Cognition Disconfirmation of expectations Cognition Affect Affect Affect Regret

(*) = item is counter-indicative of customer satisfaction with BANK

The measurement model

A measurement model is a statistical representation of the responses of the participants of a survey to the measurement instrument. If the measurement model represents the data well, a scale for measurement and measurement values for the participants follow from the model. It was hypothesised that a unidimensional measurement model fits the data, and that all items are monotonically related to the underlying construct of satisfaction with BANK (see Section 5). The Mokken model of monotone homogeneity (MH model; Mokken, 1971) was used for this investigation (Chapter 5 through Chapter 8). The MH model is an item response theory (IRT) model. IRT is a psychometric theory about the relation between a trait and the probability of a particular response to an item reflecting the trait. The relationship is typically represented by an item response function (IRF). For dichotomously scored items (i.e., two scores, often 0/1 for disagree/agree), the IRF reflects the probability of endorsement (i.e., score 1) with an item given a particular position

72

on the trait (e.g., see the MH model for dichotomous items; Sijtsma & Molenaar, 2002, p. 11). For polytomously scored items (i.e., three or more ordered scores, reflecting degrees of endorsement, e.g., 0, 1, 2, 3, 4), the item step response function (ISRF) reflects the probability of choosing a particular answer category or a higher category (i.e., at least a score of x; e.g., x = 0, 1, 2, 3, 4) of an item given a particular position on the trait (e.g., the MH model for polytomous items; Sijtsma & Molenaar, 2002, p. 119). The MH model is based upon three assumptions (Sijtsma & Molenaar, 2002, pp. 18-21). The first assumption is unidimensionality, which means that all items reflect the same trait, for example, customer satisfaction. The second assumption is local independence, which means that, given a fixed value of the latent trait, the probability of obtaining at least a score of x is unrelated to the scores obtained on the other items in the test. This means that items reflecting customer satisfaction are unrelated in a group of persons who have the same level of customer satisfaction. This may sound odd but local independence is a mathematical way of saying that only customer satisfaction explains relationships among items measuring aspects of this trait, and if the trait is held constant all remaining variation in the item scores is due to error. The third assumption is monotonicity, which means that the probability of obtaining at least a score of x is a non-decreasing function of the latent trait. Thus, the higher ones level of customer satisfaction the higher the probability of obtaining a high score on items measuring the trait. A consequence of the assumptions of unidimensionality, local independence, and monotonicity is that the MH model yields ordinal measurements of the trait. This is different from the numerical measurements obtained by more-demanding parametric IRT models, such as the Rasch model (Rasch, 1960), but ordinal measurements suffice for many measurement purposes. In particular, let the score of person p on item i be denoted Xpi, and let the sum score or the total score of participant p on the items (indexed i) in the test be defined as the sum of the item scores, X + p = i X pi , then under the MH model the ordering of the participants by means of their X + values reflects their ordering on the scale of the latent trait, except for measurement error (Sijtsma & Molenaar, 2002, p. 121; also, see Van der Ark, 2005). Empirical research has demonstrated that the total-score X + has a strong linear correlation with the estimated latent trait value from parametric IRT models in several measurement applications (e.g., Sijtsma, Emons, Bouwmeester, Nyklicek, & Roorda, 2008). Sijtsma et al. (2008) suggested the use of total-score X + for diagnostic purposes, such as the assessment of the position of a person on the latent trait, and for statistical analyses, such as
73

the comparison of groups or the measurement of change. A general condition for this use of
X + is that the MH model fits the data.

The extent to which the theoretical data structure predicted by the MH model is different from the observed data is expressed by means of total-scale scalability coefficient H (Loevinger, 1948; Mokken 1971) for the whole set of items, and item scalability coefficient H i for individual items. Coefficient H ranges from a negative value, depending on several characteristics of the item scores, to the maximum of 1. For a given distribution of total-score
X + and a particular set of monotone increasing ISRFs, as the slopes of the ISRFs become

steeper, item scalability coefficients H i and total-scale scalability coefficient H have higher positive values, gradually approaching 1 as the slopes become nearly vertical. Thus, high positive values (usually, H i 0.3 and H 0.3; Sijtsma & Molenaar, 2002, p. 60) of item scalability coefficients H i and total-scale scalability coefficient H in a data set are taken as evidence of steeply monotone ISRFs, and this in turn means that person ordering by means of
X + is more reliable.

A virtue of an analysis by means of the MH model is the availability of the MSPwin5.0 software (software for Mokken Scale analysis for Polytomous items; Molenaar & Sijtsma, 2000). MSPwin5.0 facilitates to investigate statistically whether the MH model fits the data. In particular, it facilitates (a) the investigation of the dimensionality of an item set using a confirmatory strategy, or (b) the investigation of the dimensionality of an item set using an exploratory strategy, and (c) the test of the assumption of monotonicity. Furthermore, MSPwin5.0 provides the test-score distribution and interesting summary statistics, such as the mean, the standard deviation, and the skewness of this distribution (Molenaar & Sijtsma, 2000, pp. 60-61). The confirmatory strategy to investigate the dimensionality entails the investigation whether a set of items, which is defined a priori to form a scale, indeed is found to be a scale based on values of the item scalability coefficients H i and total-scale scalability coefficient H in the sample data set from the population of interest. To have a Mokken scale, all inter-item correlations must be positive and the values of H i and H must be at least 0.3 (Sijtsma & Molenaar, Chapter 5). A Mokken scale is unidimensional and allows sufficiently reliable person measurement by means of total-score X + . MSPwin5.0 facilitates this strategy by means of the item selection method Test (Molenaar & Sijtsma, 2000, p. 48).

74

The exploratory strategy to investigate the dimensionality entails the clustering of items from a larger set into smaller clusters (one cluster is also allowed), each of which is characterised by positive inter-item correlations and item scalability coefficients H i and totalscale scalability coefficient H that are at least 0.3. Thus, each cluster represents a Mokken scale. MSPwin5.0 facilitates this search strategy by means of the item selection methods Search normal (forms item clusters from a set of items) and Search extended (takes the second, third, and so on, Mokken scale found by means of Search normal as point of departure for clustering while leaving the other items free for selection), and the option to choose different lower bounds than the default value 0.3 for item scalability coefficients H i and total-scale scalability coefficient H (Molenaar & Sijtsma, 2000, p. 40). The assumption of monotonicity can be investigated for every ISRF of every item, by estimating the ISRFs from the data. An item which has five different item scores, has five different ISRFs, which are conditional probabilities P( X i x | ) , in which x = 0, , 4 and stands for the latent trait. Because every participant has one of the five possible scores, the probability of obtaining at least a score of 0 equals 1 (a participant always has one of the scores). Thus, only the four ISRFs for x = 1, , 4 are of interest. In data analysis, when the ISRFs of item i are estimated, the latent trait is replaced by the rest-score R. Rest-score R is the total-score X + minus the item-score X i . The use of the total-score X + would lead to heavily biased estimates of the ISRFs of item i, and this is prevented by using rest-score R. A rest-score group contains all participants having equal rest scores. The assumption of monotonicity is violated in the sample if the probability of obtaining a score on item i of at least x is higher for a lower rest-score group than for a higher rest-score group. MSPwin5.0 provides an option called Minsize for the manipulation of the minimum size of the rest-score groups (adjacent rest-score groups may be merged to obtain sufficiently large groups; this is convenient for small and large scores which are often underrepresented in samples), and an option Minvi which defines the minimum value of observed violations of monotonicity in sample ISRFs that are subjected to statistical testing (small violations may be uninteresting irrespective of whether they are significant or not; Molenaar & Sijtsma, 2000, pp. 67-73). In MSPwin5.0, the default value for Minsize is 10 percent of the sample size, and the default for Minvi is 0.03 on a probability scale that runs from 0 to 1. The option Alpha = p manipulates the significance level for tests of significance of violations of monotonicity. Default in MSPwin5.0 is Alpha = 0.05.

75

It was hypothesised that the nine items of satisfaction with BANK constitute a scale according to the MH model. This hypothesis was tested in sample data from the population of interest. If the MH model fits the data, a scale according to the MH model can be constructed, and the scale scores can be computed.

Hypotheses

This section addresses the formulation of hypotheses regarding characteristics of the satisfaction scores (i.e., the satisfaction with BANK scale-scores). The hypotheses concerned (a) the explicit construct representation, (b) the implicit construct representation, (c) conceptrelated irrelevant variance, and (d) method-related irrelevant variance, and they were tested in empirical studies with respect to customer satisfaction (Chapter 5 through Chapter 8). The purpose of the tests of the hypotheses was to gather empirical evidence whether the scale scores can be interpreted in terms of satisfaction with BANK, and nothing else.

Explicit construct representation First, it was expected that persons attached different connotations to the term satisfaction when asked to explain what satisfaction with the company meant to them. This expectation was in line with the theory of Oliver (1997) that satisfaction may result from different processes, and the notion by Wittgenstein (1953, 1958) that the linguistic meaning of a term cannot be delineated sharply. Second, it was expected that the nine items (Table 3) constituted a scale according to the MH model (Section 6). Third, it was expected that the satisfaction with BANK scale-scores were positively related to other satisfaction with BANK scores. This was in agreement with the requirement of convergent validity (Campbell & Fiske, 1959).

Implicit construct representation Customer satisfaction was expected to be positively related to (a) trust, (b) quality, (c) customer loyalty, and (d) future customer profitability. The associations between these concepts were postulated in the nomological network of customer satisfaction (Chapter 3).

Concept-related irrelevant variance Concept-related irrelevant variance refers to variance due to variables that are presumably related to the construct of interest. Variables that are presumably related to customer

76

satisfaction are the variables in the nomological network of the construct (Chapter 3). In terms of the theory of violators (Oort, 1996), such variables are possible violators of the unidimensionality of the scale of the construct of interest. The measurement instrument for customer satisfaction was constructed with the purpose to minimise contamination of scale scores by these variables (Section 5). Therefore, it was expected that trust, quality, customer loyalty, and current customer profitability did not contaminate satisfaction scores obtained by the satisfaction with BANK measurement instrument.

Method-related irrelevant variance Method-related irrelevant variance refers to variance caused by variables that are presumably unrelated to the construct of interest, such as characteristics of the method of the study and response styles of persons. Characteristics of the method that may affect response behaviour are, for example, the mode of administration, the format of items, the item order, and the wording of items (e.g., Bradburn, 1983). There is ample evidence of the effect of these phenomena on the persons responses to items (e.g., Belson, 1981, 1986; Bradburn, 1983; Bronner & Kuijlen; 2007; Krosnick, 1999; Schuman & Presser, 1981, Sheatsley, 1983). The classical example was provided by Rugg (1941), who demonstrated that 46% of the participants in a survey supported free speech when asked Do you think the United Stated should forbid public speeches against democracy, while only 25% of the participants supported free speech when asked Do you think the United States should allow public speeches against democracy. Thus, the question phrased in terms of to allow yielded different results than the question phrased in terms of to forbid. Schuman and Presser (1981, pp. 276-278) replicated this result. Paulhus (1991, p. 17) explained a response style of a person as a consistent tendency of a person to respond to questionnaire items on some basis other than the specific item content (i.e., what the items were designed to measure). Examples of response styles are acquiescence, disacquiescence, midpoint responding, extreme responding, noncontingent responding, and socially desirable responding (e.g., Baumgartner & Steenkamp, 2001, 2006; Paulhus, 1991; Van Herk, 2000). The acquiescence response style is defined as a general preference for the agreement response categories of item scales, and the disacquiescence response style is defined as a general preference for the disagreement response categories of item scales. These two response styles may be investigated by means of control scales. Theorists (e.g., Baumgartner & Steenkamp, 2001, 2006; Knowles & Nathan, 1997; Paulhus,

77

1991; Van Herk, 2000) suggested limiting the influence of these two response styles on the measurement of a trait by simultaneously using items that are indicative of that trait and items that are counter-indicative of that trait. Both kinds of items were included in the measurement instrument of customer satisfaction (see Section 5). The extreme response style is defined as a general preference for extreme response categories (i.e., the endpoints) of item scales, and the midpoint response style is defined as a general preference for the middle response category of item scales. These two response styles also may be investigated by means of control scales (e.g., Baumgartner & Steenkamp, 2001, 2006; Bronner & Kuijlen, 2007; Greenleaf 1992a, 1992b). For example, control scales may be used to measure general midpoint responding and general extreme responding, and the corresponding scores may be correlated with measurements of the trait of interest in order to assess the influence of stylistic responding on the measurement of the trait. Noncontingent responding refers to the tendency to respond randomly to items. This response style may be investigated by means of multivariate outlier analyses (e.g., Tabachnick & Fidell, 1997, pp. 74-75). Socially desirable responding refers to the tendency of persons to make themselves look good by providing socially desirable responses to the items. This response style may be investigated by means of control scales (e.g., Paulhus, 1991). Stylistic responding is a threat to validity of measurement. Messick (1991; also, see Jackson & Messick, 1958) argued that stylistic responding is inversely related to the extent that responses of persons to items are content-driven. This is an important stance. First, this stance implies that stylistic responding is inhibited by optimising. Optimising (Krosnick, 1991, 1999) is response behaviour that is characterised by giving much consideration to the accuracy of the responses. For example, when a person puts effort in understanding an item and in providing the optimal response to the item, he or she is said to optimise (Krosnick, 1999, p. 546-547). Second, this stance implies that stylistic responding is enhanced by satisficing. Satisficing (Krosnick, 1991, 1999) is response behaviour that is characterised by giving little consideration to the accuracy of the responses. For example, when a person does not spend effort to generate the most accurate answer to a question but settles for a merely satisfactory one, he or she is said to satisfice (Krosnick, 1999, p. 548). Third, Messicks (1991) stance implies that the conditions that enhance satisficing also enhance stylistic responding. These conditions are (a) task difficulty, (b) persons abilities, and (c) persons motivation to optimise (Krosnick, 1999, p. 548). It is beyond the scope of this study to assess the contamination of scale scores by all method-related phenomena. For this reason, it was decided to start the study into effects of
78

these phenomena by addressing four issues that were important for further applications of the instrument, and for satisfaction research in general. First, it was investigated whether the location of satisfaction items in the questionnaire influenced satisfaction scores. Second, it was investigated whether the presentation mode of response alternatives of satisfaction items influenced satisfaction scores. Third, it was investigated whether persons positions on the midpoint response style influenced satisfaction scores. Fourth, it was investigated whether persons positions on the extreme response style influenced satisfaction scores. The hypotheses The expectations and questions with respect to construct representation and irrelevant variance were formalised in a set of hypotheses. The hypotheses are listed in Table 4.

Table 4: List of Hypotheses


Explicit construct representation H1 Customer satisfaction is manifested in various expressions that are mutually related but not sharply delineated H2 The satisfaction items constitute a scale according to the MH model H3 The satisfaction scores are positively related to other satisfaction scores Implicit construct representation H4 Satisfaction scores are positively related to trust scores H5 Satisfaction scores are positively related to quality scores H6 Satisfaction scores are positively related to loyalty scores H7 Satisfaction scores are positively related to future customer profitability Concept related irrelevant variance H8 The satisfaction scores are not contaminated by trust H9 The satisfaction scores are not contaminated by quality H10 The satisfaction scores are not contaminated by loyalty H11 The satisfaction scores are not contaminated by current customer profitability Method related irrelevant variance H12 The satisfaction scores are not affected by the location of items in the questionnaire H13 The satisfaction scores are not affected by the presentation of the response categories of satisfaction items H14 The satisfaction scores are not affected by the midpoint response style H15 The satisfaction scores are not affected by the extreme response style

79

80

Chapter 5 Method of the first empirical study into customer satisfaction with BANK

Introduction

This chapter addresses the method of the first empirical study into customer satisfaction with BANK. The chapter provides an outline of the operationalisations of the constructs, and the construction of the questionnaire, the pre-tests, the pilot study, and the main study.

Operationalisations

Customer satisfaction Customer satisfaction was operationalised by means of nine Likert items (Table 3, Chapter 4) with five ordered response categories each, ranging from totally agree (which was scored 4) to totally disagree (which was scored 0) (Table 1). The nine items were expected to constitute a unidimensional scale after re-scoring the counter-indicative items (Chapter 4). Table 1: Items Reflecting Customer Satisfaction with BANK
Code Q3a Q3b Q3d* Q3e* Q3g Q4a Q4b Q4c* Q4d* Item At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK I have mixed feelings about BANK BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK Last year I had some problems with BANK Aspect Affect General satisfaction Cognition Affect Need fulfilment Affect Disconfirmation Regret Cognition Score range 04 04 04 04 04 04 04 04 04

* = item is counter-indicative of customer satisfaction with BANK

American Customer Satisfaction Index Customer satisfaction was also operationalised by means of a measurement instrument adopted from the American Customer Satisfaction Index (ACSI; e.g., Fornell et al., 1996). This instrument (Table 2) consisted of three items with ten ordered response categories each,
81

ranging from very negative (e.g., very dissatisfied, which was scored 0) to very positive (e.g., very satisfied, which was scored 9). The three items were expected to constitute a unidimensional scale (see Chapter 3). The instrument is further denoted as the ACSI. Table 2: American Customer Satisfaction Index
Code Q20b Q20c Q20d Item How satisfied are you with BANK? To what extent does BANK meet your ideal of a bank? To what extent has BANK met your expectations? Score range 09 0-9 0-9

Trust Following Morgan and Hunt (1994), trust was defined as a persons confidence in the reliability and integrity of the company. On the basis of the definition of trust, a set of seven Likert items was formulated. Each item had five ordered response categories that ranged from totally agree (which was scored 4) to totally disagree (which was scored 0). Two items were counter-indicative of trust, and covered distrust. The seven items are listed in Table 3. In the context of retail banking, confidence in integrity and confidence in reliability are intertwined (see also Chapter 3). Many expectations, such as the expectation that the company will keep its promises and the expectation that the company will handle the banking matters of a person properly, encompass both confidence in the reliability of the company and confidence in the integrity of the company. Consequently, we expected the seven items to constitute a unidimensional scale. Table 3: Items Reflecting Trust
Code Q5a Q5b Q5c Q5d* Q5e* Q5f Q5g Item I can depend on BANK to treat me fairly I can depend on BANK to handle my banking affairs correctly I can depend on BANK to keep its promises I sometimes doubt the competence of BANK I sometimes doubt the good will of BANK I can trust BANK I can depend on BANK to serve me well Aspect Integrity Both Both Reliability Integrity Both Both Score range 0-4 0-4 0-4 0-4 0-4 0-4 0-4

*= item is counter-indicative of trust

82

Quality In Chapter 3, quality was defined as a persons perception of the quality of attributes of products and services provided by the company. This definition is in agreement with the conception of quality as perceived quality, which implies that quality had to be measured by means of a psychological measurement instrument. Because quality pertains to distinct attributes of products and services provided by the company, we expected the instrument to yield a multidimensional measurement of quality. Furthermore, we expected the combination of a customers positions on these dimensions to drive customer satisfaction (Chapter 3, Section 5). Wirtz and Bateson (1995; also Wirtz 2000) demonstrated that halo effects influenced several measurements of quality, meaning that responses to items about quality of attributes of products or services provided by the company were influenced by general satisfaction with the company. The occurrence of halo effects may have been enhanced by the operationalisations of quality. To control for halo effects, we decided to operationalise quality in two different and concrete and detailed ways, which we hoped would stimulate the respondent to contemplate about the quality of distinct attributes of products and services rather than provide an overall and perhaps too impressionistic global evaluation. First, quality was operationalised by means of a set of items regarding the experience of problems with BANK in the preceding twelve months. A listing of problems was assessed on the basis of an inventory of customer complaints with the company, and previous research into drivers of customer satisfaction (e.g., Terpstra & Van Gastel, 2004). A total of 16 problems, thus defining 16 items, was included in the questionnaire (Table 4). Persons were asked whether or not these problems had occurred to them in the preceding twelve months. The response yes was scored 1, and the response no was scored 0. It was expected that the 16 items were not correlated or weakly correlated. Second, quality was operationalised by means of a set of 24 items measuring judgements about attributes of the products and services provided by the company (Table 5). Each item had four ordered response categories that ranged from excellent (which was scored 3) to bad (which was scored 0). The set of attributes was assessed on the basis of previous satisfaction research of the company (Terpstra & Van Gastel, 2004), and covered a broad range of topics. Because it covered a broad range of topics, it was expected that the items constituted multiple scales.

83

Table 4: Items Reflecting Quality. All Items are Counter-Indicative of Quality.


Code Q6a Q6b Q6c Q6d Q6e Q6f Q6g Q6h Q6i Q6j Q6k Q6l Q6m Q6n Q6o Q6p Problem Errors in the execution of your banking affairs Errors in the execution of your orders Insufficient information on your banking affairs Ambiguous information on your banking affairs Unfair costs of banking services Slow service Slow money transfers Not keeping an appointment Insufficient accessibility by telephone Insufficient accessibility by Internet Insufficient accessibility of offices Insufficient response to questions Problems with debit cards Problems with cash withdrawels Problems with internet banking Another problem Score range 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1

84

Table 5: Items Reflecting Quality


Code Q7a Q7b Q7c Q7d Q7e Q7f Q8a Q8b Q8c Q8d Q8e Q8f Q9a Q9b Q9c Q9d Q9e Q9f Q10a Q10b Q10c Q10d Q10e Q10f Item Correct execution of orders Speed of money transfers Speed of service delivery Adherence to promises Correct execution of banking matters Distribution of bank statements Costs of accounts of the company Convenience of products and services Clarity of information provided Sufficiency of information provided Costs of services of the company Interest rates of the company Service by telephone Service by the Internet Service by bank offices Service by mail correspondence Accessibility of the company Facilities for Internet banking Friendliness of employees Capability of employees Reliability of employees Openness for questions Responsiveness of the company Handling of complaints Score range 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3 0-3

Customer loyalty Following Gremler and Brown (1996, 1999), customer loyalty was defined as the degree to which a customer is doing repeat business with the company, possesses a positive attitudinal disposition towards the provider, and considers only this provider when a need for this service arises. According to this definition, customer loyalty encompasses (a) cognitions, affects, and behaviour with respect to the company, and (b) a comparison of the company with other firms. On the basis of this definition, a set of six Likert items was constructed to operationalise customer loyalty (Table 6). Each item reflected a particular aspect of customer

85

loyalty (i.e., cognition, affect, or past behaviour), and had five ordered response categories ranging from totally agree (which was scored 4) to totally disagree (which was scored 0). In accordance with former studies using similar measurement instruments of customer loyalty (e.g., Caruana, 2002; Gremler & Brown, 1999), we expected the six items to constitute a unidimensional scale. Table 6: Items Reflecting Customer Loyalty
Code Q14a Q14b Q14c* Q14d* Q14e Q14f Item I have more sympathy for BANK than for other banks For some matters I am better of with another bank I consider switching from BANK to another bank BANK offers me benefits other banks dont offer For many years BANK has been my primary bank Aspect Affect Cognition Cognition Cognition Behaviour Score range 04 04 04 04 04 04 If I need new financial products, BANK is my first choice Cognition

* = item is counter-indicative of customer loyalty

Customer profitability In Chapter 3, customer profitability (CP) was defined as the gross financial contribution of a customer to a company in a specified period of time. Because a long time period is less subject to behavioural anomalies than a short time period (Mulhern, 1999), we chose a time period of a year for the measurement of CP. Thus, CP at time t was the gross financial contribution of a customer to a company in the twelve months preceding time t. CP consisted of interest profits and provision profits. Interest profits and provision profits were a function of the balances held or the provisions paid by a customer on the one hand, and the corresponding gross margins of the company on the other hand (the gross margins are the margins of the company before the costs for servicing the customer, such as transaction costs, contact costs, marketing costs, and overhead costs, are accounted for; see for example Cooper & Kaplan, 1991, p. 469). For example, if a customer held 1000 euro credit balance during one month, and the companies gross margin on 1 euro credit balance was 0.002 euro per month, the interest profits yielded by the customer were equal to 2 euro. The summation of all profits from a customer over 12 months preceding time t was labeled CP at time t. Three additional remarks are in order. First, CP at time t was computed monthly by the company, and expressed in euro. The CP-figures from September 2005, September 2006, and

86

September 2007 were collected from the internal databases of the company (Section 6 of the present chapter). Second, if an account (e.g., a mortgage) was held by two or more customers, one of these customers was registered by the company as the primary owner of the product. Only the accounts for which the customer was registered as the primary owner were included in the calculation of profitability of the customer. Third, if a customer left the company, the customer did not generate any profits from that month onwards, and after a year the profits generated by this customer in the preceding twelve months were reduced to zero. The company registered this as a missing value on CP at time t, but in this study this missing value actually represents zero profits.

Interest Interest was measured in order to test the quality of the survey data by means of correlating items reflecting customer satisfaction and items reflecting interest (to be discussed in Section 2 from Chapter 6). We expected that items reflecting customer satisfaction were uncorrelated with items reflecting interest, and a different result would raise suspicion about the quality of the survey data. A customers interest in banking matters was operationalised on the basis of two items (Table 7). Each item had five ordered response categories that ranged from highly interested (which was scored 4) to not interested (which was scored 0). We expected the items to be positively correlated.

Table 7: Items Reflecting Interest


Code Q17 Q18 Item How interested are you in banking matters? by banks? Score range 0-4

How interested are you in the development of new products and services 0 - 4

The questionnaire

The questionnaire (Appendix 1; in Dutch) was composed of the items reflecting customer satisfaction (represented by two item sets), trust, quality (also represented by two item sets), customer loyalty, and interest. In addition, some items were included in the questionnaire for business purposes, and some other items were included to optimise the design of the questionnaire. For example, some items regarding product ownership and contacts with the

87

company were included in order to elicit the participants memories of the company before the measurement of satisfaction with the company started. Furthermore, some items regarding relations of the participant with other providers of financial services were included in order to elicit his or her memories of other providers of financial services before proceeding with the measurement of loyalty with the company. The design of the questionnaire, the format of the items, and the wording of the items were based upon general principles concerning survey research (see, e.g., Belson, 1986; Dillman, Tortora, & Bowker, 1998; Sheatsley, 1983: Sudman & Bradburn, 1982). An important issue was the inclusion of the no answer option among the response options of the items. It is well known that items allowing respondents to use a no answer option may provide problems in data analysis (e.g., Tabachnick & Fidell, 2007, pp. 62-63), and that a no answer option may invoke satisficing (e.g., Krosnick, 1999; Krosnick & Fabrigar, 1997). Nevertheless, because of four reasons it was decided to maintain the no answer option of items: (a) Interviews with participants after they had taken pre-tests of the questionnaire revealed that they appreciated the no answer option. They claimed that they could not answer particular items if they had no experience with the subject. An example of such an item concerned the handling of complaints by the company. It was considered useful to include these items in the questionnaire, in particular to collect data on the seriousness of a particular problem. (b) To limit the risk of satisficing (Krosnick, 1999), the item texts were kept short, simple, and concrete in order to limit the difficulty of the participants task and prevent participants from taking the easy way in answering the items thus using the no answer option too light-heartedly; (c) A pilot study (to be discussed in Section 5) demonstrated that the no answer option was rarely used with respect to the satisfaction items. The response option apparently did not invoke satisficing on this subject; and (d) A practical reason for using the no answer option was that the questionnaire was to be administered via the Internet. The administration mode encompassed a forcing mechanism that required the participant to respond to an item before proceeding to the next item. Such a mechanism may contaminate the data, because a participant may have good reasons not to answer a particular question (Dillman et al., 1998). Thus, the no answer option was also meant to neutralise the forcing mechanism.

88

The ordering of items within a block of items, such as the items within block Q3 (Appendix 1), was different across different administrations of the questionnaire. The effect of the location of the satisfaction items (Q3, Q4, and Q20; Appendix 1) on the scale scores was assessed in the pilot study. The objective of these measures was to test and to control for order effects. The questionnaire was improved by means of qualitative pre-tests among 10 persons and a pilot study among 372 persons. The pre-tests (to be discussed in Section 4) demonstrated that it took 15 to 35 minutes for participants to complete the questionnaire. We considered this rather long and suspected this might demoralise participants, and stimulate satisficing (e.g., Krosnick, 1999, pp. 248-249, Sheatsley, 1983, p. 223; Sudman & Bradburn, 1982, p. 262). In order to motivate participants to complete the questionnaire, we explained the purpose of the study in the E-mail (Appendix 2; in Dutch) by which they were invited to participate in the survey.

The pre-tests

The questionnaire was pre-tested between February 2005 and May 2005, by means of depth interviews with mature customers of BANK. The first objective of the pre-tests was to test how long it took participants to complete the questionnaire and to explore participants interpretations of the items in the questionnaire. The second objective of the pre-tests was to test the first hypothesis of the empirical study (i.e., customer satisfaction is manifested in various expressions that are mutually related but not sharply delineated; see Section 7 in Chapter 4). The results of the pre-tests were used to improve the wording of the items and the design of the questionnaire, before executing the pilot study and the main study. Furthermore, the results were used to test the first hypothesis. Target population The target population of this study consisted of the mature customers of a Dutch retail bank. These were adults who were registered by the company as the primary owner of at least one banking product provided by the company.

Sample The sample was composed of ten mature customers of the bank. Four were male and six were female. Their age varied between 29 and 71 years. Their education ranged from professional

89

to academic. None of the persons was occupied in consumer research or the financial services industry.

Procedure The questionnaire was presented in paper-and-pencil format to the participant. The participant filled out the questionnaire, and the interviewer registered the time it took to complete the questionnaire. Afterwards, the interviewer interviewed the participant. The participant was probed into his or her satisfaction with the company, into the meaning that he or she attached to satisfaction with a retail bank, and into the answers he or she had given to the survey items. The responses were registered on paper by the interviewer. Data The interviewers notes about the time span of the survey and the responses of participants to the post-survey interview constituted the raw data.

The pilot study

The pilot survey was conducted in August 2005, among mature customers of the bank. The first objective of the pilot study was to test the procedure of the survey. It was assessed (a) how many participants completed the questionnaire, (b) how often missing values on items occurred, and (c) what kind of comments the participants made with respect to the questionnaire. The second objective was to test the hypotheses 12 and 13 (the hypotheses regarding the effect of (a) location of satisfaction items and (b) ordering of response categories on scale scores; see Section 7 in Chapter 4). The results of the pilot study were used to decide on technical properties of the main survey, and to test the hypotheses 12 and 13.

Design Four versions of the questionnaire were administered that differed with respect to the location of the satisfaction items in the survey, and the ordering of the response categories of the satisfaction items. On the basis of this design (Table 8) it was tested whether (a) the location of the satisfaction items in the questionnaire, and (b) the ordering of the response categories of satisfaction items, had an effect on the average satisfaction scores.

90

Table 8: Design of the Pilot Study


Survey version 1 2 3 4 location of items A A B B ordering of categories A B A B N 90 95 89 98

The location of the satisfaction items refers to the location of Q3, Q4 and Q20 (Appendix 1) in the questionnaire. In the survey versions 3 and 4, the locations of Q3 and Q4 on the one hand and Q20 on the other hand were reversed. The order of response categories refers to the response categories of the Likert items, which were totally agree agree neutral disagree totally disagree. In the survey versions 2 and 4, the response categories were displayed in reversed order.

Target population The target population of this study consisted of the mature customers of a Dutch retail bank. These were adults who were registered by the company as the primary owner of at least one banking product provided by the company.

Sample The sample was drawn from the research panel of the company. This panel was composed of a total of 3984 mature customers of the company who had agreed to participate in marketing research via the Internet. The agreement encompassed that (a) the company is free to approach the person for marketing research, (b) the person is free to participate in the research or to decline, (c) the company is allowed to use the survey data for research purposes only, and (d) the company is not allowed to distribute any personalised data to third parties. All panel members could be approached by E-mail, and had a unique customer-id that was used for identification purposes. The reasons for using the research panel for this study were (a) its considerable size, (b) its facilities for Internet research, and (c) the availability of a customer-id for each panel member. The customer-id facilitated the enrichment of the survey data with the company data that were needed in this study. The arguments in favour of the use of the research panel outweighed the argument against the panel, which was the possibility that the panel might be

91

biased with respect to some psychological characteristics. For example, it cannot be ruled out that (a) persons who were willing to participate in the panel had a different attitude towards banking than persons who were not willing to participate in the panel, and (b) persons who had access to the Internet had different psychological characteristics than persons who do not have access to the Internet. Thus, the choice for using the research panel may have enhanced coverage error (i.e., error due to the result that different units in the target population have different probabilities of being included in the sample; e.g., Dillman & Bowker, 2001; Groves, 1989). Three additional remarks with respect to the research panel are in order. First, the variable customer segment refers to a segmentation which reflects the value of the customers to the company, and which was used by the company for marketing purposes. The company distinguished three segments, which were Top Customers, Standard Customers, and Development Customers. Each customer of the company, except the ones that were not administered as the primary owner of a product provided by the company, was segmented in one and only one of these segments. Because the companys most valuable customers (i.e., Top Customers) were overrepresented in the research panel, the panel differed significantly (2(2) = 1270, p < 0.001) from the target population with respect to the distribution of customer segment (Table 9). Second, the panel differed significantly (2(2) = 324, p < 0.001) from the target population with respect to the distribution of gender. Males were overrepresented in the panel (Table 9). This was partly due to the overrepresentation of males among the segment Top Customers (i.e., the segment that was overrepresented in the research panel), and partly to unknown causes. Third, the panel differed significantly (2(2) = 299, p < 0.001) from the target population with respect to the distribution of age group (Table 9). The average age in the panel was 47 years, and in the target population it was 48 years. The average age in the target population appears to be high, but this is because only adults constituted this population. In total, 800 persons were invited to participate in the survey. These persons were selected randomly from the research panel. The response rate in the pilot study was approximately 47% (N = 372), and the participants were distributed more or less evenly across the four versions of the questionnaire (Table 8). The distributions of customer segment, gender, and age group within subsequently the company, the panel, and the sample are reported in Table 9.

92

In line with our expectations, the sample differed significantly from the target population with respect to customer segment (2(2) = 209, p < 0.001), gender (2(2) = 42, p < 0.001), and age group (2(2) = 35, p < 0.001). Furthermore, the sample differed significantly from the panel with respect to customer segment (2(2) = 16.91, p < 0.001). Thus, respondents differed significantly from non-respondents with respect to customer segment. The sample was representative of the panel with respect to gender and age group.

Table 9: Distribution (Percentages) of Customer Segment, Gender, and Age Group in the Pilot Study
Company Customer segment Top Standard Development Gender Female Male Unknown Age group 18 to 39 years 40 to 59 years 60 years and older 35 38 27 30 51 19 28 52 20 44 52 4 31 66 3 30 68 2 30 44 26 56 32 12 64 30 6 Panel Sample

Procedure The survey was administered via the Internet. Persons were invited by E-mail to participate in the survey. The questionnaire was made available at a site of the marketing research agency that managed the survey. The questionnaire was accessible from 19 August 2005 until 4 September 2005. Persons had access to the site on the basis of a password and were identified on the basis of a customer-id. After a participant completed the questionnaire, the data were uploaded to the agency. The participants received a small incentive (i.e., saving points valued 10 euro). This is the common fee that the company paid to panel members that responded to a survey of medium length.

93

Data The research agency yielded a file containing the raw data, which were the coded responses of the participants to the survey items (the research agency scored a no answer response as a missing value). In order to enrich the raw data, the file was merged with the marketing database. The merging was executed on the basis of customer-id, and it was successful for all participants. Subsequently, three variables were added to the file, which were (a) customer segment ultimo September 2005, (b) gender, and (c) age ultimo September 2005.

The main study

The main survey was conducted in October 2005, among mature customers of the bank. The study was used to construct the measurements of the constructs, and to test the hypotheses (see Section 7 in Chapter 4).

Target population The target population of this study consisted of the mature customers of a Dutch retail bank. These were adults who were registered by the company as the primary owner of at least one banking product provided by the company.

Sample A total of 3612 persons were invited to participate in the survey. They were the remainder of the research panel of the company (i.e., the part of the panel that did not participate in the pilot study). The response rate in the main study was approximately 47% (N = 1689). The distributions of customer segment, gender, and age group within subsequently the company, the remainder of the panel, and the sample are reported in Table 10. In line with our expectations, the sample differed significantly from the target population with respect to customer segment (2(2) = 813, p < 0.001), gender (2(2) = 183, p < 0.001), and age group (2(2) = 157, p < 0.001). Furthermore, the sample differed significantly from the remainder of the panel with respect to customer segment (2(2) = 75, p < 0.001), gender (2(2) = 9.95, p < 0.01), and age group (2(2) = 8.85, p < 0.05). Thus, respondents differed significantly from non-respondents with respect to customer segment, gender, and age group. For gender and age the absolute differences were very small, and for practical purposes they may be ignored.

94

Table 10: Distributions (Percentages) of Customer Segment, Gender, and Age Group in the Main Study
Company Customer segment Top Standard Development Gender Female Male Unknown Age group 18 to 39 years 40 to 59 years 60 years and older 35 38 27 30 51 19 28 52 20 44 52 4 31 66 3 30 68 2 30 44 26 55 32 13 61 30 9 Remainder of Panel Sample

Procedure The survey was administered via the Internet. Persons were invited by E-mail to participate in the survey. The questionnaire was made available at a site of the marketing research agency that managed the survey. The questionnaire was accessible from 30 September 2005 until 16 October 2005. Persons had access to the site on the basis of a password and were identified on the basis of a customer-id. After a participant completed the questionnaire, the data were uploaded to the agency. The participants received a small incentive (i.e., saving points valued 10 euro). This is the common fee that the company paid to panel members that responded to a survey of medium length.

Data The research agency yielded a file containing the raw data, which were the coded responses of the participants to the survey items (again, a no answer response was scored as a missing value). In order to enrich the raw data, the file was merged with the marketing database. The merging was executed on the basis of customer-id, and it was successful for all participants. Subsequently, seven variables were added to the file, which were (a) customer segment ultimo September 2005, (b) gender, (c) age ultimo September 2005, (d) CP ultimo September 2005, (e) CP ultimo September 2006, (f) CP ultimo September 2007, and (g) indicator whether the customer had deceased between September 2005 and September 2007.
95

96

Chapter 6 Results of the first empirical study into customer satisfaction with BANK

Introduction

This chapter addresses the results of the first empirical study into customer satisfaction with BANK. First, the preliminary analyses are discussed. The purpose of these analyses was to examine the data quality and to prepare the data for the subsequent analyses. Second, the measurement analyses are discussed. The purpose of these analyses was to construct the scales of customer satisfaction, trust, quality, and customer loyalty. Third, the tests of the hypotheses explained in more detail in Chapter 4 are discussed. The purpose of these tests was to collect empirical evidence regarding the validity of measurement of customer satisfaction. Fourth, additional research into the relation between customer satisfaction and future customer profitability (future CP) is discussed. The purpose of these analyses was to explore this relation in more detail than we did for the tests of the hypotheses. Fifth, the implications of the results of the empirical study are addressed. The discussion includes the assessment of the strengths and weaknesses of the customer satisfaction scale. Sixth, the conclusions of the study are presented.

Preliminary analyses Method

This section addresses the preliminary analyses of the raw data from the pre-tests, the pilot study and the main study.

Pre-test data First, the data from the pre-tests were analysed. The interviewer reproduced the interviews verbatim on the basis of the notes he made during the interview. The report of each interview included (a) the registration of the time the participant took to complete the survey, (b) the

97

participants explanation of his or her satisfaction with the retail bank, and (c) the participants comments on the survey and the questionnaire items.

Pilot study data Second, the data from the pilot study were analysed. For this purpose, the dataset containing the raw data was converted into a SAS dataset, and the items that were assumed to be counterindicative of the constructs (see the description of the measurement instruments in Chapter 5) were recoded in the opposite direction. In order to get an impression of the distribution characteristics of the variables, histograms and descriptive statistics of all variables in the dataset were computed and examined. For this purpose, proc univariate (SAS STAT) and proc means (SAS STAT) were used. In order to test the data quality, the correlations between the items reflecting customer satisfaction with the retail bank and the items reflecting interest in banking matters were examined. For this purpose, proc corr (SAS STAT) was used. It was expected that, (a) the items reflecting satisfaction were highly correlated, (b) the items reflecting interest were highly correlated, and (c) the items reflecting satisfaction and the items reflecting interest were uncorrelated. Missing data may hamper the data analyses (e.g., Tabachnick & Fidell, 2007, p. 62). Item-score imputation is a method for handling missing item scores in multiple-item questionnaires. Suppose, the score of participant p on item i is missing. Then, the imputation of an item score based on the observed part of the data for participant p and item i, to be discussed shortly in more detail, is an effective and simple way to complete the data matrix and not lose a large part of the sample, as with the popular missing data handling by means of listwise deletion. In the statistical literature (e.g., Little & Rubin, 2002; Schafer & Graham, 2002), it is well known that the way in which missing data have to be handled depends on the mechanism that underlies the missingness. This mechanism often is difficult to identify once the missingdata problem has presented itself, and this complicates adequate missing-data handling in much empirical research. For item-score missingness in multiple-item questionnaires, in which multiple items are used to measure one underlying construct such as satisfaction, Bernaards and Sijtsma (2000) and Van Ginkel, Van der Ark, and Sijtsma (2007) found that imputation of item scores has little or no biasing effect on outcomes of statistical analyses when the percentage of missing item scores in the data matrix does not exceed, say, 15

98

percent. Serious bias is absent even when the missingness mechanism cannot be ignored in the sense that the missing item scores cannot be considered a random sample from the complete data matrix. The explanation for this robustness is that the available data contain much information on the underlying construct, and thus are well able to compensate for the non-randomness of the missing data. Because in the pilot study and the main study the total percentage of missing item scores did not exceed 15, item-score imputation could be used safely (results discussed in the next section). For the imputation of item scores, we used two-way imputation with normally distributed errors (abbreviated method TW-E; e.g., Bernaards & Sijtsma, 2000; Sijtsma & van der Ark, 2003; Van Ginkel, 2007; Van Ginkel, Van der Ark, & Sijtsma, 2007). Van Ginkel (2007) demonstrated that this method yielded nearly unbiased results in important psychometric quantities such as Cronbachs alpha. Method TW-E is suited in particular for item sets that measure one construct. Let the score of person p on item i be missing. In twoway imputation, a real value TWpi is estimated on the basis of (a) the mean of person ps available scores on the other items of the scale (i.e., the person mean PMp), (b) the mean of the available scores of the other persons in the sample on item i (i.e., the item mean IMi), and (c) the mean of all available scores of all persons in the sample on all items which constitute the scale (i.e., the overall mean OM), so that TWpi = PMp + IMi OM. In two-way imputation with normally distributed errors, a random error pi is added to TWpi, so that TWpi(E) = TWpi + pi. The random error is drawn from a normal distribution with zero mean and variance 2. Variance 2 is obtained from the squared differences between the observed scores Xpi in the data matrix, and the expected scores TWpi computed by means of method TW-E. If TWpi(E) is a real number, it is rounded to the nearest integer within the range of feasible item scores, and this rounded value is imputed in cell (p,i) of the data matrix. Method TW-E requires that at least one item from the item set reflecting a construct is answered by the participant. Otherwise, the person mean PMp, and consequently TWpi, cannot

99

be computed. Thus, no values were imputed for missing scores of participants who did not answer at least one item from a particular scale. We excluded participants with missing scores from particular analyses if it was plausible that missing scores on an item were due to the item being non-applicable for these participants. For example, missing scores on an item addressing quality of complaint handling by the company (i.e., item Q10f; see Table 5 in Chapter 5) may be due to the item being nonapplicable for participants who never had a complaint about the company. Because it is unrealistic to impute a score for a missing value that indicates that an item may not be applicable for a participant, we did not impute values for these missing scores but rather excluded this case from the analysis (also, see Chapter 5, in which the decision to include the no answer option was discussed). We excluded variables from the dataset if it was suspected that (a) the missingness was nonignorable, (b) there were no substantive arguments for the imputation of the missing scores, and (c) the variables were considered to be dispensable for the study. For example, two variables reflecting customer loyalty were deleted from the dataset because of these reasons (to be further discussed in the Section Results). Main study data Third, the data from the main study were analysed, similar to the data from the pilot study. These analyses included (a) the recoding of items that were assumed to be counter-indicative of the construct of interest, (b) the examination of distribution characteristics of the variables in the dataset, (c) the imputation of missing values, and (d) the examination of correlations between items reflecting satisfaction and items reflecting interest in banking matters. Furthermore, in the main study (but not in the pilot study) a weighting factor containing weights for persons in the dataset was computed, and outlier analyses were done. In Chapter 5, it was demonstrated that the sample differed significantly from the target population with respect to customer segment, gender, and age group. The analyses demonstrated that the difference with respect to customer segment between the sample and the target population was larger than the differences with respect to gender and age group. Because in-company research demonstrated that customer segment is an important variable in customer profitability analyses (e.g., Terpstra, 2005) and because we intended to analyse the relation between customer satisfaction and customer profitability, we decided to weight participants in order to obtain proportional representation of customer segments in the sample. Hox (1998) advocated weighting of persons if the sample is biased, and comparing the results from statistical analyses with and without weighting. Following Hox (1998), we compared the
100

results of the analyses regarding the relation between customer satisfaction and future customer profitability, with and without the weighting (Section 4). The weights of the participants belonging to a particular customer segment were computed as the ratio between the proportion of the customer segment in the company population and the proportion of the customer segment in the sample. This means that the participants belonging to a customer segment that was overrepresented in the sample were given a smaller weight than the participants belonging to a segment that was underrepresented in the sample. Univariate and multivariate outlier analyses were conducted to find cases that may hamper the data analyses (e.g., Tabachnik & Fidell, 2007, pp. 72-77). For the detection of univariate outliers, the histograms of variables were examined. For the detection of multivariate outliers, the distances of persons to the centroid of the multivariate space defined by the items in the dataset were examined. These distances can be expressed by the Mahalanobis Distance (Mahalanobis, 1936) and by the leverage statistic, which is a function of the Mahalanobis Distance (Tabachnick & Fidell, 2005, pp. 74-75). Let MD denote the Mahalanobis Distance, and N the sample size, then for person p his/her leverage, denoted hpp, is defined as:

hpp = (MD / N - 1) + (1 / N).

We chose the leverage statistic for the detection of multivariate outliers, because this statistic is readily available in SAS. Following Tabachnick & Fidell (2007, pp. 74-75, 111-112), regression analysis was used to calculate the leverage statistic. This was done using several items that reflected different constructs as predictors and customer-id as criterion (because the leverage statistic expresses the distances of persons to the centroid of the multivariate space defined by the predictor variables in the regression analysis, the choice of the criterion variable in the regression analysis is unimportant). Persons with a significant value for leverage (p < 0.001) were defined as multivariate outliers, and their score patterns were visually examined to find out what caused the high leverage value. The outliers were marked in the dataset by an indicator variable. Furthermore, for each participant the proportion of missing values on each set of items constituting a measurement instrument was computed. If this proportion exceeded 0.5, a participant was marked as an outlier. To evaluate the impact of outliers on the results, we did all analyses on the dataset including the outliers (i.e., the complete dataset) and on the dataset without outliers (i.e., the reduced dataset).
101

Results

The pre-tests The participants explained their satisfaction with the retail bank in different ways. The participants explanations of his or her satisfaction with the retail bank are listed in Table 1, and they are discussed in Section 4.

Table 1: Listing of Explanations of Satisfaction with the Company


Participant Satisfaction Explanation of satisfaction with the retail bank 1 2 Very satisfied Satisfied I feel good about [BANK]. My banking affairs are taken care of well with [BANK]. They [BANK] do nothing wrong. There is nothing to be dissatisfied about There is nothing to be enthusiastic about either. If [COMPETITOR] would have current accounts, I would switch immediately. 3 4 Satisfied Satisfied They [BANK] will not deceive you, such as [COMPETITOR] X]. That was my former bank [BANK] is easy to deal with, with limited costs. Ive got the impression that they [BANK] will not deceive me, and then its all right with me ... Im not particularly concerned with banking affairs, my partner takes care of banking affairs 5 6 7 8 Very satisfied Satisfied Satisfied Satisfied The staff is always friendly, and the bank is easy to deal with I feel good about [BANK]. I trust [BANK] I wont go to [COMPETITOR], to me its important that I can trust my bank. It [BANK] is a friendly bank They [BANK] are accessible There is nothing to be dissatisfied about. They are friendly and they are accessible Although a relative once had an annoying incident with [BANK]. Her card was stolen and was used abroad. First, they [BANK] refused to compensate. This is not what I expected from [BANK]. 9 10 Satisfied It is all right, it never goes wrong I dont care much about banking affairs I dont have any referents. Moderately In general it is all right, but last year I had an incident with [BANK]. It was satisfied about the costs of banking services. They charge basic services, while they make enormous profits with our money.

102

The pilot study Histograms (not shown here) demonstrated that the polytomous items reflecting customer satisfaction (two item sets), trust, quality (two item sets), customer loyalty, and interest were single peaked, and mostly negatively skewed. For example, most participants responded positively to items, which were indicative of satisfaction, and negatively to items, which were counter-indicative of satisfaction (Table 2). This corresponds with the findings in other satisfaction studies (e.g., Oliver, 1997; Peterson & Wilson, 1992). The histograms also revealed a small group of outliers on the items adopted from the ACSI. Because these items were not used in subsequent analyses of the pilot data, no further actions were undertaken with respect to these outliers.

Table 2: Descriptive Statistics of Items Reflecting Customer Satisfaction (Before Imputation; N = 372)
Code Q3a Q3b Q3e* Q3g Q4a Q4b Q4c* Item At BANK I feel at home I am satisfied with BANK I have mixed feelings about BANK BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK Nmiss 2 1 0 2 0 0 0 0 1 Mean 2.94 2.96 2.99 2.72 2.67 2.88 2.85 3.27 2.85 SD 0.73 0.69 0.94 0.95 0.83 0.77 0.73 0.71 1.03 Skewness -0.72 ** -0.99 ** -1.13 ** -0.85 ** -0.61 ** -1.36 ** -0.95 ** -1.20 ** -0.87 **

Q3d* There are good reasons to leave BANK

Q4d* Last year I had some problems with BANK * = scored reversely, ** = p < 0.001

The descriptive statistics demonstrated a low incidence of missing values (i.e., smaller than five percent) on the items reflecting customer satisfaction, trust, customer loyalty, and interest, and a higher incidence of missing values on some items reflecting quality. This latter result was probably due to items mentioning topics that were irrelevant to particular participants; also, see Chapter 5. For the items constituting the measurement instrument for customer satisfaction, Table 2 shows that there were few missing item scores; thus method TW-E was used for imputing values for the missing item scores. The descriptive statistics (i.e., mean, standard deviation, and skewness) for the items before imputation were almost identical to the descriptive statistics for the items after imputation. This result supports the use
103

of method TW-E, and the items after imputation were used for subsequent analyses (i.e., the analyses for the test of the hypotheses 12 and 13; see Chapter 4). Table 3 shows the correlations between two items reflecting satisfaction (Table 1 in Chapter 5) and two items reflecting interest in banking matters (Table 7 in Chapter 5). In agreement with our expectations, (a) the satisfaction items correlated highly, (b) the interest items correlated highly, and (c) the satisfaction items and the interest items were almost uncorrelated. These results strengthened our confidence in the quality of the data.

Table 3: Correlations Between Two Items (Q3a and Q3b) reflecting Customer Satisfaction and Two Items (Q17 and Q18) Reflecting Interest
Item At BANK I feel at home I am satisfied with BANK How interested are you in banking matters? How interested are you in the development of new products and services by banks? Code Q3a Q3b Q17 Q18 Q3a Q3b 0.72 Q17 0.08 0.01 Q18 0.03 -0.04 0.65

The main study The results from the preliminary analyses of the data from the main study were similar to the results from the analyses of the pilot data. Histograms (not shown here) demonstrated that all polytomous items reflecting customer satisfaction, trust, quality, customer loyalty, and interest were single peaked, and mostly negatively skewed; see Table 4. For the variables reflecting customer profitability (CP) in September 2005, September 2006, and September 2007, histograms (not shown here) showed single peaked and positively skewed distributions. Forty-three participants had a standardised CP in September 2005, 2006, or 2007, which was larger than 3. These participants were outliers, but because they had correct values for CP (i.e., not incorrect values due to, e.g., clerical errors), they were retained for the data analyses. Outliers are common in financial data. In the financial services industry, customer profits (i.e., CP according to the gross CP conception; Chapter 3) often follow a Pareto-like distribution (i.e., 20% of the customers is responsible for 80% of the companys profits). To reduce the skewness of the distribution and the influence of the outliers on subsequent analyses, we applied a logarithmic transformation to CPt (Jack, 1967; Tabachnick & Fidell,

104

Table 4: Descriptive Statistics of Polytomous Items Reflecting Customer Satisfaction, Trust, Customer Loyalty and Interest (Before Imputation; N = 1689)
Code Q3a Q3b Q3d* Q3e* Q3g Q4a Q4b Q4c* Q4d* Q5a Q5b Q5c Q5d* Q5e* Q5f Q5g Q14a Q14b Q14c* Q14d* Q14e Q14f Q17 Q18 Q20b Q20c Q20d Item Customer satisfaction items At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK I have mixed feelings about BANK BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK Last year I had some problems with BANK Trust items I can depend on BANK to treat me fairly I can depend on BANK to handle my banking aff. corr. I can depend on BANK to keep its promises I sometimes doubt the competence of BANK I sometimes doubt the good will of BANK I can trust BANK I can depend on BANK to serve me well Customer loyalty items If I need new fin. products, BANK is my first choice I have more sympathy for BANK than for other banks For some matters I am better of with another bank I consider switching from BANK to another bank BANK offers me benefits other banks dont offer For many years BANK has been my primary bank Interest items How interested are you in banking matters? How interested are you in dev. of new p&s by banks? ACSI items How satisfied are you with BANK? To what extent does BANK meet your ideal of a bank? To what extent has BANK met your expectations? 7 48 19 6.56 6.11 6.54 1.30 1.44 1.37 -1.13 ** -1.08 ** -1.23 ** 22 34 2.70 2.27 1.03 1.11 -0.47 ** -0.22 ** 26 34 124 36 97 9 2.44 2.47 1.83 3.01 2.18 2.96 0.95 0.90 1.01 0.91 0.82 0.99 -0.41 ** -0.41 ** 0.30 ** -0.95 ** -0.04 -1.08 ** 9 4 16 20 24 4 6 2.83 2.90 2.77 2.78 2.80 2.89 2.75 0.66 0.63 0.70 0.87 0.88 0.64 0.71 -1.00 ** -1.12 ** -1.04 ** -0.72 ** -0.80 ** -0.75 ** -0.84 ** 8 2 25 19 2 10 5 14 14 2.90 2.92 3.01 2.72 2.62 2.82 2.77 3.21 2.99 0.73 0.71 0.95 0.97 0.87 0.71 0.75 0.76 0.94 -0.73 ** -1.21 ** -0.98 ** -0.60 ** -0.65 ** -0.71 ** -1.00 ** -1.05 ** -1.03 ** Nmiss Mean SD Skewness

(*) = scored reversely, (**) = p < 0.001

105

2007, pp. 87-89). Let CPt denote CP at time t, TCPt transformed CPt, and ln the natural logarithm. Because the minimum values for CPt was zero euro, we applied the following transformation: TCPt = ln(CPt + 1) .

Table 5 shows the correlations between two items reflecting satisfaction (Table 1 in Chapter 5) and two items reflecting interest in banking matters (Table 7 in Chapter 5). In agreement with our expectations, (a) the satisfaction items correlated highly, (b) the interest items correlated highly, and (c) the satisfaction items and the interest items were almost uncorrelated. These results strengthened our confidence in the quality of the data. The items reflecting customer satisfaction (including the items from the ACSI), trust, customer loyalty, and interest had few missing data on (i.e., 5% or less; see Table 4), so that item-score imputation could be used safely. An exception was made for the items with respect to customer loyalty; this is discussed shortly. The descriptive statistics of the items before imputation were almost identical to the descriptive statistics of the items after imputation. Some participants left more than 50 percent of the items reflecting satisfaction, trust, or interest unanswered (Table 6). These participants were considered outliers, and indicator variables identified them in the dataset. Table 4 demonstrates substantial percentages of missing scores on two items reflecting customer loyalty, which are the items Q14c (For some matters I am better off with another bank; Nmiss = 6 percent) and Q14e (BANK offers me benefits other banks dont offer; Nmiss = 7 percent). The meaning of item Q14c was probably too vague, because the phrase some matters is ambiguous and imprecise, probably referring to a variety of products and services that are provided by retail banks. The meaning of item Q14e also was probably too vague, because the phrase offers me benefits was not articulated. Thus, it is unclear whether this phrase refers to characteristics of the company, such as the location of a bank office or the availability of Internet banking facilities, or to financial offers by the company, such as a personalised interest rate. The unfortunate phrasing of these two items in combination with the circumstance that these items were dispensable for the study led us to delete the items from the dataset, even though the percentages of missing item scores were smaller than 15. The missing data on the remainder of the items reflecting customer loyalty (Table 4) were imputed using method TW-E. Some participants left more than 50 percent of the items

106

reflecting customer loyalty unanswered (Table 6). These participants were considered outliers, and we created an indicator variable to identify them in the dataset.

Table 5: Correlations Between Two Items (Q3a and Q3b) Reflecting Customer Satisfaction and Two Items (Q17 and Q18) Reflecting Interest
Item At BANK I feel at home I am satisfied with BANK How interested are you in banking matters? How interested are you in the development of new products and services by banks? Code Q3a Q3b Q17 Q18 Q3a Q3b 0.64 Q17 0.05 -0.01 Q18 0.04 -0.02 0.62

Table 6: Number of Participants Leaving More Than Half of the Items Unanswered
Satisfaction N 1 ACSI 14 Trust 6 Loyalty 5 Interest 11

Histograms (not shown here) demonstrated that all polytomous items reflecting quality were single peaked and mostly negatively skewed; see Table 7. The polytomous items reflecting quality had many missing item scores, part of which may be due to items being non-applicable for the participants involved (also, see Chapter 5). For example, a missing score on an item concerning the quality of complaint-handling by the company (i.e., item Q10f; Table 7) might indicate that the participant never had any complaints with the company. Similarly, a missing score on an item concerning telephone service by the company (i.e., item Q9a; Table 7), might indicate that the participant never phoned the company. Because imputation of values for missing scores on such items would be meaningless, we decided to exclude persons with missing scores on the polytomous items reflecting quality from analyses of the data about quality. In general, the regular users of the BANK have a greater chance of running into problems with transactions and services than the low-frequency users. Thus, it is likely that the latter group is overrepresented in the missing scores on the quality items. In order to detect multivariate outliers, the leverage statistic was computed by means of a regression analysis using customer-id as the criterion variable, and as the predictor variables

107

25 items reflecting customer satisfaction (Table 4), trust (Table 4), customer loyalty (Table 4; the items Q14c and Q14e were excluded), interest (Table 4), and the items from the ACSI (Table 4) (see Tabachnick & Fidell, 2007, pp. 75-76, 111-112). The analysis yielded 119 participants with a significant (p < 0.001) leverage value. Visual inspection of the data demonstrated that these participants tended to give extremely positive or extremely negative responses. Furthermore, the inspection demonstrated that the eight participants with the highest leverage value alternated extremely positive and extremely negative responses to different items having similar content. For example, a participant responded extremely positive to one half of the items reflecting satisfaction with the company (i.e., items Q3a, Q3b, Q3d, Q3e, and Q3g; Table 4) and extremely negative to the other half of the items reflecting satisfaction with the company (i.e., items Q4a, Q4b, Q4c, and Q4d; Table 4). Another example is a participant who answered extremely positive to one item from the ACSI (i.e., item Q20b; Table 4) and extremely negative to the other items from the ACSI (i.e., items Q20c and Q20d; Table 4). A third example is a participant who answered extremely negative to all items reflecting satisfaction with the bank (customer satisfaction items; Table 4) and extremely positive to the items from the ACSI (ACSI items; Table 4). It was suspected that the eight participants with the highest leverage value had responded inconsistently to the survey items. An indicator variable was created to identify them in the dataset. This variable was joined with the variables marking the participants who left the majority of items reflecting a particular construct unanswered (see Table 6). The union of these variables identified 39 outliers in the dataset. These 39 outliers were excluded from some analyses (the dataset including the 39 outliers was labeled the complete dataset, and the dataset without the 39 outliers was labeled the reduced dataset). The weights needed to achieve proportional representation with respect to customer segment were computed on the basis of the distributions of customer segment in the company population and in the sample (see Chapter 5). Subsequently, the weights were recorded in a variable called the weighting factor (Table 8).

108

Table 7: Descriptive Statistics of Polytomous Items Reflecting Quality (N = 1689)


Code Q7a Q7b Q7c Q7d Q7e Q7f Q8a Q8b Q8c Q8d Q8e Q8f Q9a Q9b Q9c Q9d Q9e Q9f Q10a Q10b Q10c Q10d Q10e Q10f Item Correct execution of orders Speed of money transfers Speed of service delivery Adherence to promises Correct execution of banking matters Distribution of bank statements Costs of accounts of the company Convenience of products and services Clarity of information provided Sufficiency of information provided Costs of services of the company Interest rates of the company Service by telephone Service by the internet Service by bank offices Service by mail correspondence Accessibility of the company Facilities for internet banking Friendliness of employees Capability of employees Reliability of employees Openness for questions Responsiveness of the company Handling of complaints Nmiss 11 12 37 162 19 10 201 32 32 51 94 144 456 325 288 376 85 302 202 250 327 360 219 656 Mean 2.11 1.67 1.81 1.87 2.04 1.38 1.09 1.91 1.80 1.78 1.07 0.85 1.81 1.89 1.71 1.74 1.82 1.91 2.02 1.84 1.94 1.70 1.97 1.67 SD 0.57 0.82 0.63 0.61 0.53 0.86 0.70 0.58 0.62 0.60 0.70 0.69 0.65 0.65 0.77 0.61 0.65 0.68 0.57 0.60 0.49 0.69 0.50 0.71 Skewness -0.37 ** -0.33 ** -0.39 ** -0.63 ** -0.31 ** -0.19 0.27 ** -0.35 ** -0.61 ** -0.57 ** 0.31 ** 0.41 ** -0.47 ** -0.52 ** -0.55 ** -0.55 ** -0.50 ** -0.56 ** -0.40 ** -0.65 ** -0.62 ** -0.66 ** -0.49 ** -0.67 **

** = p < 0.001

Table 8: Distribution of Customer Segment Within the Company, the Panel and the Sample
Customer Segment Top Standard Development Company 30% 44% 26% Sample 61% 30% 9% Weighting factor 30 / 61 44 / 30 26 / 9

109

Measurement analyses

Measurement analyses aim to construct scales and to evaluate their psychometric quality. We used Mokkens MH model (Chapter 4) to analyse the data representing the participants responses to the measurement instruments used in the empirical study. The use of the MH model yielded the measurement scales and the participants scale scores. All measurement analyses were done on the basis of the data from the main study. The scales of customer satisfaction, customer satisfaction on the basis of the ACSI, trust, and customer loyalty were constructed using the MH model. For this purpose, the software program MSPwin5.0 was used (Molenaar & Sijtsma, 2000). Because it was hypothesised that each set of items reflecting a construct constituted a unidimensional scale, the confirmatory search strategy of Mokken scale analysis (Chapter 4) was used. For the analysis of the item scores reflecting quality, both Mokken scale analyses and factor analyses (Gorsuch, 1983, pp. 239-256) were used. The Mokken scale analyses were done using MSPwin5.0, and the factor analyses were done using proc factor (SAS STAT). Because it was expected that the items reflecting quality constituted multiple scales and we had no hypothesis about the number of scales, we used exploratory strategies for scale development. Factor analysis (e.g., Bollen, 1989; Gorsuch, 1983) is a technique for investigating the dimensionality of an item set. If the researcher has a hypothesis regarding the dimensionality of the item set and which items load on particular factors, he or she may apply confirmatory factor analysis to test this hypothesis. (e.g., Bollen, 1989). If the researcher does not have such a hypothesis, exploratory factor analysis (e.g., Gorsuch, 1983) may be used for investigating the structure of the item set and the identification of common factors that account for correlations in the item set. Hierarchical factor analysis (Gorsuch, 1983, pp. 239-256) is a type of exploratory factor analysis, which may be used to explore the dimensionality in a dataset if dimensions are nonorthogonal, meaning that factors are correlated. Instead of computing loadings for often difficult to interpret oblique factors, the correlation matrix of oblique factors is further factoranalysed. This analysis yields one or more higher-order factors that account for the common variance that is due to all items, and two or more orthogonalised lower-order factors that account for the common variance that is due to clusters of items (Gorsuch, 1983, pp. 248252). Following Wirtz (2000) and Wirtz and Bateson (1995), who reported the presence of halo effects in measurements of attribute satisfaction (Oliver, 1993), we suspected that halo
110

effects also could prevail in the measurement of the quality of attributes of products and services provided by the company. These halo effects might strengthen the correlations between all items, and cause strong correlations between factors reflecting different dimensions of quality. Therefore we chose hierarchical factor analysis for the exploration of the dimensionality of the data about quality. In order to explore the robustness of the results of the factor analysis, we also applied Mokken scale analysis to the data.

Customer satisfaction Customer satisfaction was operationalised using the measurement instrument presented in Chapter 5 (Table 1 in Chapter 5). It was hypothesised that the nine items constitute a scale according to the MH model. To test this hypothesis, Mokken scale analysis was done using MSPwin5.0. First, the dimensionality of the item set was investigated using the confirmatory strategy (Section 6 from Chapter 4). Second, the assumption of monotonicity was investigated (Section 6 from Chapter 4). Third, the scale-score statistics (Molenaar & Sijtsma, 2000, pp. 60-61) were evaluated. Fourth, the scalability of the item set within distinct customer segments, gender groups, and age groups was investigated. For this purpose, customer segment, gender, and age group were defined as grouping variables (Molenaar & Sijtsma, 2000, pp. 28-29). Fifth, univariate analyses of variance were done to test whether subgroups defined on the basis of customer segment, gender, and age differed significantly with respect to scale scores. For this purpose, proc GLM (SAS STAT) was used. Sixth, the effect of outliers on the results was investigated by repeating the analyses on the reduced dataset (i.e., the dataset without outliers, see Section 2). The confirmatory Mokken scale analyses (item selection method = Test) demonstrated that the nine items constituted a Mokken scale with a total-scale scalability coefficient H equal to 0.59 and a reliability coefficient rho equal to 0.91 (Table 9). The lowest item scalability coefficient Hi was equal to 0.50, which is well above the default lowerbound for the Hi used in exploratory analyses (i.e., lowerbound Hi = 0.3). This result supported the inclusion of all nine items in the scale, and thus the conception of customer satisfaction as a unidimensional construct. The scale consists of items that are indicative of satisfaction and items that are counter-indicative of satisfaction. This result supports the conception of customer satisfaction as the bipolar opposite of customer dissatisfaction. The check for item monotonicity on the basis of the default options in MSPwin5.0 (i.e., Minvi = 0.03 and Minsize = 168, which is 10 percent of the sample) did not reveal any

111

Table 9: Customer Satisfaction Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (N = 1689)
Total group T 0.57 0.63 0.60 0.60 0.59 0.60 0.66 0.61 0.50 0.59 0.91 0.91 0.90 0.93 0.92 0.59 0.56 0.65 0.62 0.51 0.45 0.61 0.58 0.46 0.58 0.90 0.60 0.60 0.65 0.63 0.60 0.65 0.63 0.73 0.66 0.65 0.76 0.76 0.67 0.72 0.94 0.60 0.58 0.67 0.62 0.59 0.70 0.59 0.55 0.66 0.63 0.57 0.65 0.58 0.58 0.67 0.64 0.50 0.60 0.91 0.61 0.53 0.67 0.58 0.60 0.73 0.60 0.60 0.57 0.62 0.61 0.59 0.74 0.59 0.61 0.61 0.60 0.64 0.65 0.62 0.51 0.60 0.91 0.63 0.60 0.69 0.67 0.61 0.79 0.66 0.64 0.57 0.53 0.60 0.59 0.56 0.66 0.57 0.59 0.48 0.57 0.57 0.54 0.57 0.54 0.63 0.54 0.48 0.54 0.89 S D F M U 18-39 40-59 60+ Customer segment Gender Age

Label

At BANK I feel at home

I am satisfied with BANK

There are good reasons to leave BANK *

I have mixed feelings about BANK *

BANK meets all my requirements for a bank

Last year I had a pleasant relationship with BANK

BANK has met my expectations

I have regretted my choice for BANK *

Last year I had some problems with BANK *

112

Rho

* = scored reversely

Table 10: Customer Satisfaction Scores in the Complete Dataset (N = 1689)


Gender Female 26.23 16.35 < 0.001 25.88 Male Unknown 24.53 1.99 0.13 Age group 18-39 25.31 40-59 26.14 60+ 26.39 4.67 < 0.01 25.96 Total

Customer segment

Mean

26.44

25.60

23.78

violations of the assumption of monotonicity. This means that the ISRFs of all items were nondecreasing for all rest-score groups. However, the check for item monotonicity on the basis of smaller rest-score groups (i.e., Minsize = 84, which is 5 percent of the sample) yielded two significant violations of the assumption of monotonicity. These violations were due to small decreases in the estimated ISRF for Q3d >= 2 (There are good reasons to leave BANK; Table 4) (Figure 1) and the estimated ISRF for Q4c >= 4 (I have regretted my choice for BANK; Table 4) (Figure 2). Thus, the MH model did not fit the data perfectly. The psychometric properties of the scale were slightly improved if item Q3d was removed from the scale. The 8-item scale yielded a total-scale scalability coefficient H equal to 0.59 without significant violations of the assumption of monotonicity, a result that was also found when the assumption was tested on the basis of small rest-score groups (i.e., Minsize = 84). However, it is doubtful whether the 8-item scale yielded better measurements of satisfaction, because each item in the scale is important for sufficient content validity (i.e., equal coverage of all aspects of customer satisfaction in the scale). We decided to proceed with the 9-item scale because the violations of monotonicity in the 9-item scale were small, and the 9-item scale had the best content validity.

Figure 1: Item step response functions of item Q3d: There are good reasons to leave BANK

113

Figure 2: Item step response functions of item Q4c: I have regretted my choice for BANK

The customer satisfaction scale-score distribution is presented in Figure 3. It may be noted that the distribution of scale scores was significantly skewed to the left (p < 0.001), and that there were outliers in the skew tail. The negative skewness is a common result in customer satisfaction measurements (Peterson & Wilson, 1992). It is unknown whether the outliers were caused by extreme dissatisfaction of the corresponding participants with the company or by stylistic responding. Stylistic responding is investigated in Chapter 8. The Mokken scale analyses using the grouping variables customer segment (valued Top Customers, Standard Customers, and Development Customers; see Chapter 5), gender (valued female, male, and missing), and age (valued 18 to 39 years, 40 years to 59 years, and 60 years onwards; see Chapter 5) demonstrated that the nine items constituted a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 9). The checks for item monotonicity did not yield significant violations of the assumption of monotonicity, a result that was also found for smaller rest-score groups (i.e., Minsize = 84). For this reason, it was concluded that the 9-item scale may be used to measure customer satisfaction in different subgroups of the target population.

114

freq 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 freq

Figure 3: Distribution of customer satisfaction scores in the complete dataset (N = 1689, mean = 25.96, SD = 5.57, and skewness = -0.85)

Table 10 shows that customer segments differed significantly with respect to scale score (based on analysis of variance). This result is consistent with results from previous satisfaction studies done by the company (e.g., Terpstra, 2005), and it suggests that the three customer segments differ with respect to the average satisfaction with the company. The result also supports the pursuit of proportional representation of customer segments in descriptive studies of customer satisfaction. Furthermore, gender groups did not differ significantly with respect to scale score (Table 10). Age groups differed significantly with respect to scale score (Table 10). The latter result was unexpected, but because the magnitude of the differences between the age groups was small, we considered it unimportant in the context of the present study. The analyses of the reduced dataset yielded similar results as the analyses of the complete dataset. The confirmatory Mokken scale analyses (item selection method = Test) yielded a scale with a total-scale scalability coefficient H equal to 0.60 and a reliability coefficient rho equal to 0.91 (Table 11). The check for item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize = 165, which is 10 percent of the sample) did not reveal violations of the assumption of monotonicity. The same result was found for the

115

check for item monotonicity with smaller rest-score groups (i.e., Minsize = 83). Thus, the MH model fitted the data in the reduced dataset. The Mokken scale analyses using the grouping variables customer segment, gender, and age yielded a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 11). The checks for item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize = 165, which is 10 percent of the sample) did not yield significant violations of the assumption of monotonicity in subgroups. However, the check for item monotonicity on the basis of smaller rest-score groups (i.e., Minsize = 83) yielded a significant violation of the assumption of monotonicity for item Q4c (Table 4) in the age group of 60 years and older. This was due to a decrease of the estimated ISRF for Q4c >= 3 (i.e., the proportion of responses Q4c >= 3 decreased from 1.00 in the middle rest-score group to 0.96 in the highest rest-score group). Because the magnitude of the decrease was small, we considered it not disturbing and we concluded that the scale score is useful for the measurement of customer satisfaction in different subgroups of the target population. The customer satisfaction scale-score distribution (Figure 4) was significantly skewed to the left (p < 0.001). Furthermore, univariate analyses of variance demonstrated that the customer segments and the age groups differed significantly with respect to scale score (Table 12). Gender groups did not differ significantly.

freq 250 200 150 freq 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36

Figure 4: Distribution of customer satisfaction scores in the reduced dataset (N = 1650, mean = 26.04, SD = 5.50, and skewness = -0.84)

116

Table 11: Customer Satisfaction Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Reduced Dataset (N = 1650)
Total group T 0.57 0.65 0.60 0.60 0.59 0.60 0.66 0.61 0.51 0.60 0.91 0.91 0.90 0.93 0.91 0.60 0.56 0.65 0.61 0.52 0.45 0.61 0.57 0.47 0.59 0.91 0.60 0.60 0.65 0.63 0.60 0.66 0.62 0.74 0.65 0.66 0.76 0.76 0.67 0.72 0.94 0.60 0.57 0.66 0.62 0.60 0.69 0.59 0.55 0.66 0.63 0.57 0.64 0.58 0.59 0.68 0.65 0.51 0.60 0.91 0.61 0.52 0.67 0.58 0.60 0.72 0.60 0.61 0.57 0.62 0.60 0.60 0.74 0.59 0.61 0.60 0.59 0.63 0.66 0.61 0.51 0.60 0.91 0.64 0.64 0.69 0.66 0.64 0.79 0.67 0.65 0.58 0.52 0.60 0.59 0.56 0.65 0.57 0.58 0.53 0.61 0.59 0.55 0.60 0.55 0.64 0.54 0.49 0.56 0.90 S D F M U 18-39 40-59 60+ Customer segment Gender Age

Label

At BANK I feel at home

I am satisfied with BANK

There are good reasons to leave BANK *

I have mixed feelings about BANK *

BANK meets all my requirements for a bank

Last year I had a pleasant relationship with BANK

BANK has met my expectations

I have regretted my choice for BANK *

Last year I had some problems with BANK *

117

Rho

* = scored reversely

Table 12: Customer Satisfaction Scores in the Reduced Dataset (N = 1650)


Gender Female 26.26 16.39 < 0.001 25.99 Male Unknown 24.62 1.71 0.18 Age group 18-39 25.36 40-59 26.23 60+ 26.47 5.01 < 0.01 26.04 Total

Customer segment

Mean

26.51

25.68

23.84

American Customer Satisfaction Index The ACSI (Table 2 in Chapter 5) was used as the second operationalisation of customer satisfaction. The empirical data were analysed by means of Mokken scale analyses. The analyses were done in both the complete dataset and the reduced dataset. First, the dimensionality of the item set was investigated using the confirmatory strategy (see Chapter 4). Second, the assumption of monotonicity was tested using the default check for item monotonicity (see Chapter 4). Third, the scale scores and the scale-score statistics were computed. The analyses of the complete dataset demonstrated that the three ACSI items constituted a strong Mokken scale (Table 13). The default check for item monotonicity (i.e., Minvi = 0.03, Minsize = 168) did not yield violations of the assumption of monotonicity. Thus, the MH model fitted the data. The scale-score distribution is presented in Figure 5, and shows negative skewness, outliers in the skew tail, peaks for the scale-scores 18 and 21, and a drop for scale-score 22. Because our major concern was the measurement of customer satisfaction on the basis of the nine-item scale (Table 1 in Chapter 5) and we expected that the irregularities of the ACSI score distribution would not seriously hamper the tests of the hypotheses (Section 4), we refrained from inquiries into the causes of the irregularities of the ACSI score distribution. The analyses of the reduced dataset (Table 13) yielded similar results as the analyses in the complete dataset. Thus, the MH model also fitted the data in the reduced dataset. The scale-score distribution is presented in Figure 6. The results in the reduced dataset were similar to the results in the complete dataset.

Table 13: ACSIs Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD)
Item How satisfied are you with BANK? To what extent does BANK meet your ideal of a bank? To what extent has BANK met your expectations? H Rho CD (N = 1684) 0.84 0.81 0.82 0.82 0.92 RD (N = 1650) 0.86 0.82 0.83 0.83 0.92

118

freq 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 freq

Figure 5: Distribution of ACSI scores in the complete dataset (N = 1684, mean = 19.20, SD = 3.80 and skewness = -1.08)

freq 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 freq

Figure 6: Distribution of ACSI scores in the reduced dataset (N = 1650, mean = 19.22, SD = 3.77, and skewness = -1.06)

119

Trust The empirical data collected by means of the trust instrument (see Chapter 5, Table 3) were analysed by means of Mokken scale analyses. First, the dimensionality of the item set was investigated using the confirmatory research method (Chapter 4). Second, the assumption of monotonicity was tested on the basis of the default check for item monotonicity (Chapter 4). Third, the scale scores and the scale-score statistics were computed. The analyses of the complete dataset demonstrated that the seven items for trust constituted a Mokken scale (Table 14). The default check for item monotonicity (i.e., Minvi = 0.03, Minsize = 168) yielded no violations of the assumption of monotonicity. Thus, the MH model fitted the data. The scale-score distribution is presented in Figure 7. The distribution was significantly skewed to the left, had outliers in the skew tail, and a large peak for scalescore 21. Because our major concern was the measurement of customer satisfaction and we expected that the irregularities of the trust score distribution would not seriously hamper the tests of the hypotheses (Section 4), we refrained from inquiries into the causes of the irregularities in the trust score distribution. The analyses of the reduced dataset yielded similar results as the analyses of the complete dataset (Table 14). Thus, the MH model also fitted the data in the reduced dataset. The scale-score distribution is presented in Figure 8. The results in the reduced dataset were similar to the results in the complete dataset.

Table 14: Trust Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD)
Item I can depend on BANK to treat me fairly I can depend on BANK to handle my banking affairs correctly I can depend on BANK to keep its promises I sometimes doubt the competence of BANK * I sometimes doubt the good will of BANK * I can trust BANK I can depend on BANK to serve me well H Rho * = scored reversely CD (N = 1689) 0.66 0.69 0.63 0.57 0.57 0.66 0.63 0.63 0.91 RD (N = 1650) 0.66 0.69 0.63 0.58 0.57 0.66 0.63 0.63 0.91

120

freq 500 450 400 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

freq

Figure 7: Distribution of trust scores in the complete dataset (N = 1689, mean = 19.71, SD = 4.02, and skewness = -0.71)

freq 500 450 400 350 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

freq

Figure 8: Distribution of trust scores in the reduced dataset (N = 1650, mean = 19.75, SD = 3.97, and skewness = -0.73)

121

Quality Quality was operationalised using the set of 24 items measuring judgements of attributes of products and services provided by the retail bank (Table 7). First, two items (i.e., Q10d and Q10f; Table 7) were excluded from the analyses because of the large percentages of missing values on these items. Second, because many item scores were missing due to inappropriateness of item content for several participants, factor analysis and Mokken scale analysis were done based on listwise deletion (see Section 2). The number of available participants for the analyses of the remaining 22 items was N = 599 in the complete dataset and N = 591 in the reduced dataset. Factor analysis (e.g., Gorsuch, 1983) was used to establish the dimensionality of the data set for the 22 quality items. Exploratory factor analysis was done to identify the factor structure of the dataset, and hierarchical factor analysis was done to investigate the relations among the factors. The results of these analyses were used to construct scales for quality. Next, Mokken scale analysis was done to assess explore the robustness of the results. The analyses were repeated in the reduced dataset. The exploratory factor analysis with squared multiple correlations used as prior communality estimates yielded only eleven positive eigenvalues (this is the result of inserting estimates of the communalities in the trace of the correlation matrix; see also Tabachnick & Fidell, 2007, p. 631), and the primary four factors explained almost 91 percent of the common variance (Table 15). Because we expected a large number of factors, we decided to proceed with all four factors in the hierarchical factor analysis. This decision was supported by the simple structure (Gorsuch, 1983, pp. 176-179) of the non-orthogonally rotated (i.e., using method promax) 4-factor solution, which was readily interpretable. The hierarchical factor analysis was done using an iterative procedure to estimate the communalities, and using an oblique rotation method (i.e., method promax). The eigenvalues are reported in Table 15, and the inter-factor correlations of the four oblique-rotated factors were high (Table 17). The factor analysis of the correlation matrix of the oblique factors yielded one higher-order factor. The higher-order factor reflected all quality items and accounted for approximately 72 percent of the common variance in the items (Table 16). The four orthogonalised lower-order factors reflected quality of contact handling, quality of Internet facilities, quality of processes, and equity of costs and revenues, respectively, and accounted for approximately 28 percent of the common variance in the items (Table 16). Because the major part of the common variance was explained by the higher-order factor, we had doubts about the dimensionality of the quality items and the interpretation of
122

the lower-order factors. These doubts were enhanced by exploratory Mokken scale analyses (item selection method = Search normal, and lowerbound Hi = 0.3 were used), which yielded a 20-item scale in the complete dataset and a 21-item scale in the reduced dataset (Table 18). It seems that a general perception of the quality of the company affected the participants responses to all items regarding quality of attributes of products and services provided by the company. Based on these results, we suspected that a halo effect (Thorndike, 1920) had affected the responses to the items reflecting quality. Wirtz and Bateson (1995; also Wirtz, 2000) reported a similar result in studies into drivers of customer satisfaction. In addition to the complications caused by the missing data on the items reflecting quality, we decided to use in the remainder of this study the data collected by means of the set of 16 items measuring the experience of problems with BANK in the preceding twelve months (Table 4, Chapter 5).

Table 15: Eigenvalues (EV) and Percentages Common Variance Explained (PCVE) from Principal Factor Analyses (PFA) and Hierarchical Factor Analyses (HFA) on the QualityItems
Complete dataset (N = 599) PFA EV 1 2 3 4 5 6 7 8 9 10 11 8.44 1.29 0.93 0.59 0.40 0.31 0.22 0.12 0.08 0.03 0.01 PCVE 67.955 10.386 7.488 4.750 3.221 2.496 1.771 0.966 0.644 0.242 0.081 EV 8.46 1.35 0.98 0.61 0.37 0.30 0.21 0.13 0.09 0.06 0.03 HFA PCVE 67.196 10.723 7.784 4.845 2.939 2.383 1.668 1.033 0.715 0.477 0.238 EV 8.30 1.27 0.95 0.58 0.40 0.31 0.23 0.13 0.09 0.04 0.01 Reduced dataset (N = 591) PFA PCVE 67.425 10.317 7.717 4.712 3.249 2.518 1.868 1.056 0.731 0.325 0.081 EV 8.32 1.33 1.00 0.61 0.37 0.30 0.22 0.13 0.10 0.06 0.04 HFA PCVE 66.667 10.657 8.013 4.888 2.965 2.404 1.763 1.042 0.801 0.481 0.321

123

Table 16: Factor Pattern Matrices of the Orthogonalised Hierarchical Factor Analysis Solution on the Quality Items, in the Complete Dataset and the Reduced Dataset
Complete Dataset (N = 599) HO LO1 LO2 LO3 LO4 HO LO1 LO2 LO3 LO4 Reduced Dataset (N = 591)

124

Correct execution of orders 0.63 -0.03 -0.02 0.42 -0.04 0.63 -0.02 -0.02 0.43 -0.04 Speed of money transfers 0.49 -0.02 0.00 0.26 0.09 0.48 -0.02 -0.01 0.26 0.10 Speed of service delivery 0.70 0.03 0.02 0.32 0.02 0.68 0.03 0.02 0.33 0.03 Adherence to promises 0.70 0.10 0.09 0.17 0.00 0.69 0.10 0.09 0.17 -0.01 Correct execution of banking matters 0.72 0.03 0.04 0.33 -0.03 0.71 0.02 0.04 0.34 -0.03 Distribution of bank statements 0.38 0.00 0.04 0.06 0.19 0.38 -0.01 0.04 0.06 0.20 Costs of accounts of the company 0.49 -0.01 -0.03 0.05 0.66 0.50 -0.01 -0.03 0.05 0.65 Convenience of products and services 0.70 0.03 0.17 0.09 -0.02 0.69 0.02 0.17 0.09 -0.02 Clarity of information provided 0.73 0.00 0.24 -0.02 0.05 0.72 -0.01 0.24 -0.03 0.04 Sufficiency of information provided 0.73 0.00 0.21 0.01 0.08 0.72 -0.02 0.22 0.00 0.08 Costs of services of the company 0.51 0.03 -0.01 0.00 0.66 0.51 0.03 -0.01 -0.01 0.67 Interest rates of the company 0.38 -0.03 0.06 -0.04 0.43 0.40 -0.03 0.06 -0.04 0.43 Service by telephone 0.64 0.31 0.06 0.03 -0.01 0.63 0.32 0.06 0.02 -0.02 0.27 -0.03 -0.02 0.67 -0.04 0.25 -0.01 -0.03 Service by the Internet 0.68 -0.05 Service by bank offices 0.40 0.17 0.06 -0.04 0.08 0.42 0.17 0.07 -0.04 0.09 Service by mail correspondence 0.64 0.15 0.13 0.00 0.07 0.63 0.16 0.12 0.00 0.07 Accessibility of the company 0.67 0.23 0.16 -0.04 -0.02 0.66 0.23 0.15 -0.05 -0.03 Facilities for Internet banking 0.64 -0.06 0.20 0.07 -0.05 0.63 -0.05 0.19 0.09 -0.04 Friendliness of employees 0.52 0.53 -0.03 -0.05 0.00 0.51 0.52 -0.03 -0.04 0.01 Capability of employees 0.61 0.47 -0.01 0.01 0.01 0.61 0.46 -0.01 0.02 0.01 Reliability of employees 0.63 0.53 -0.02 0.02 -0.02 0.62 0.52 -0.02 0.02 -0.02 Responsiveness of the company 0.67 0.51 0.00 0.03 -0.02 0.66 0.52 -0.01 0.03 -0.02 HO = higher order factor, LO1 = first lower order factor, LO2 = second lower order factor, LO3 = third lower order factor, and LO4 = fourth lower order factor.

Table 17: Correlations Between the Four Factors Representing Quality (Upper Half = Complete Dataset, Lower Half = Reduced Dataset)
Factor1 Factor1 Factor2 Factor3 Factor4 0.67 0.62 0.42 0.72 0.54 0.44 Factor2 0.68 Factor3 0.63 0.73 Factor4 0.41 0.53 0.46

Table 18: Quality Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset and the Reduced Dataset
Item Correct execution of orders Speed of money transfers Speed of service delivery Adherence to promises Correct execution of banking matters Distribution of bank statements Costs of accounts of the company Convenience of products and services Clarity of information provided Sufficiency of information provided Costs of services of the company Interest rates of the company Service by telephone Service by the internet Service by bank offices Service by mail correspondence Accessibility of the company Facilities for internet banking Friendliness of employees Capability of employees Reliability of employees Responsivenss of the company H Rho Complete dataset 0.47 0.35 0.49 0.49 0.53 * 0.38 0.49 0.50 0.50 0.40 * 0.48 0.45 0.32 0.49 0.49 0.44 0.43 0.47 0.52 0.52 0.46 0.93 Reduced dataset 0.45 0.34 0.47 0.47 0.51 * 0.40 0.47 0.48 0.49 0.40 0.30 0.46 0.44 0.32 0.47 0.46 0.42 0.41 0.46 0.50 0.50 0.43 0.93

* = excluded from the scale because item scalability coefficient Hi < 0.3

125

The distribution of the number of problems with BANK in the preceding twelve months is presented in Table 19. In both the complete dataset and the reduced dataset, 57 percent of the participants mentioned the incidence of at least one problem with BANK in the preceding twelve months. Exploratory Mokken scale analyses (item selection method = Search normal, and lowerbound Hi = 0.3 were used) yielded five scales of two items each, and six items that were non-scalable. This result indicates that the responses to the items were not the result of a unidimensional trait such as a general perception of the quality of the company. This result is consistent with the conception of quality as a multidimensional construct. In the remainder of this study, quality was re-defined as absence of problems. This definition of quality is in line with the conception of quality as absence of failures (e.g., Garvin, 1983; Kackar, 1989, p. 6; Woodall, 2001; see also Chapter 3). Because the experience of a problem is counter-indicative of quality, the items reflecting experience of problems were recoded into the opposite direction (Section 2). Quality was then operationalised as the total score on the 16 recoded items regarding the incidence of problems with BANK in the preceding twelve months (Table 19). The quality score (i.e., total score) ranged from 0 (if the participant had 16 problems with BANK in the preceding 12 months) to 16 (if the participant had 0 problems with BANK in the preceding 12 months). The distribution of the quality scores was negatively skewed (Table 19). This may hamper the tests of the hypothesis 5 (i.e., satisfaction scores are positively related to quality scores) and hypothesis 9 (i.e., satisfaction scores are not contaminated by quality). Following a suggestion of Tabachnick and Fidell (2007, pp. 87-89) to reflect negatively skewed variables and transform the reflected variables, we applied a logarithmic transformation to the variable number of problems. Let NP denote the number of problems, TNP transformed NP, and ln the natural logarithm. Because the minimum value for NP was zero, we applied the following transformation:

TNP = ln( NP + 1) .

The hypotheses 5 and 9 were tested once using the quality scores, and once using TNP.

126

Table 19: Distribution of the Number of Problems (NP), Transformed Number of Problems (TNP), and Quality Scores in the Complete Dataset and the Reduced Dataset
NP 0 1 2 3 4 5 6 >= 7 TNP 0.69 1.10 1.39 1.61 1.79 1.95 2.08 >=2.20 Quality Score 16 15 14 13 12 11 10 <=9 Percentage in Complete Dataset (N=1689) 43 25 16 8 4 2 1 1 Percentage in Reduced Dataset (N=1650) 43 25 16 8 4 2 1 1

Customer loyalty Two items (i.e., Q14c and Q14e; Table 4) were deleted from the customer loyalty item set (Table 6, Chapter 5) due to unfortunate item wording (Section 2). Mokken scale analyses were done to investigate whether the MH model fitted the data from the remaining four items. The analyses were done in both the complete dataset and the reduced dataset. First, the dimensionality of the data was investigated using the confirmatory research method (Chapter 4). Second, the assumption of monotonicity was tested on the basis of the default check for item monotonicity (Chapter 4). Third, the scale scores and the scale-score statistics were computed. The analyses of the complete dataset yielded a total-scale scalability coefficient H equal to 0.54 and a reliability coefficient rho equal to 0.80. However, the default check for item monotonicity (i.e., Minvi = 0.03 and Minsize = 168) revealed significant violations of the assumption of monotonicity. The estimated ISRF for Q14d >= 3 (I consider switching from BANK to another bank; Table 4) decreased at the end of the rest-score scale, and the estimated ISRF for Q14d>= 4 decreased at the beginning of the scale (Figure 9). The checks for item monotonicity with smaller rest-score groups (i.e., Minsize = 84) revealed that the estimated ISRF for Q14d >= 3 also decreased at the beginning of the scale (Figure 10). Thus, the MH model did not fit the data. The analyses in the reduced dataset corroborated the results found in the reduced dataset. Thus, the MH model also did not fit the data in the reduced dataset.

127

Figure 9: Item step response functions of item Q14d: I consider switching from BANK to another bank (Minsize = 168)

Figure 10: Item step response functions of item Q14d: I consider switching from BANK to another bank (Minsize=84)

128

Because the violations of monotonicity were substantial, we decided to repeat the measurement analyses without item Q14d (Table 4). The analyses of the complete dataset yielded a total-scale scalability coefficient H of 0.64 and a reliability coefficient rho of 0.82 (Table 20). The default item-check for monotonicity (i.e., Minvi = 0.03 and Minsize = 168) did not reveal violations of the assumption of monotonicity. This result was also found with smaller rest-score groups (i.e., Minsize = 84). Thus, the MH model fitted the data for the three items in the complete dataset. The analyses of the reduced dataset yielded similar results as the analyses of the complete dataset (Table 20). Thus, the MH model also fitted the three items in the reduced dataset. The content validity of the 3-item scale was considered sufficient because the three items reflected the three aspects of customer loyalty (see Table 6 in Chapter 5). Because of sufficient coverage of customer loyalty and because the 3-item scale met the requirements of the MH-model, we decided to use the 3-item scale to measure customer loyalty in all subsequent analyses. The corresponding scale-score distributions are presented in Figure 11 (complete dataset) and Figure 12 (reduced dataset). The scale-score distributions were skewed to the left. We refrained from inquiries into the cause of the skewness, because we expected that the skewness would not seriously hamper the test of the hypotheses (Section 4).

Table 20: Customer Loyalty Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (CD) and the Reduced Dataset (RD)
Item If I need new financial products, BANK is my first choice I have more sympathy for BANK than for other banks For many years BANK has been my primary bank H Rho CD (N = 1686) 0.65 0.63 0.65 0.64 0.82 RD (N = 1650) 0.64 0.63 0.64 0.64 0.81

129

freq 450 400 350 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 8 9 10 11 12

freq

Figure 11: Distribution of customer loyalty scores in the complete dataset (N = 1686, mean = 7.87, SD = 2.42, and skewness = -0.62)

freq 450 400 350 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 8 9 10 11 12

freq

Figure 12: Distribution of customer loyalty scores in the reduced dataset (N = 1650, mean = 7.89, SD = 2.40, and skewness = -0.62)

130

Tests of the hypotheses

In this section, the tests of the hypotheses (see Chapter 4) are discussed. Successively, we tested the hypotheses regarding explicit construct representation, concept-related irrelevant variance, method-related irrelevant variance, and implicit construct representation. The purpose of these tests was to collect empirical evidence with respect to the validity of the measurement of customer satisfaction. Explicit construct representation Hypothesis 1 was: customer satisfaction is manifested in various expressions that are mutually related but not sharply delineated. The hypothesis was tested by means of an examination of the verbal explanations of satisfaction given by the participants to the pretests. The pre-tests data demonstrated that participants attached diverse meanings to the term satisfaction (see Table 1). When asked to explain their satisfaction with the company in their own words, participants answered in terms of (a) general affect, (b) friendliness, (c) past performances, (d) qualities of the company, (e) absence of dissatisfaction, and (f) trust in the company. With respect to the last result, some participants answered I trust the company, The company will not deceive me, such as did , or I dont think they deceive me. These answers indicate that overall satisfaction with a particular retail bank and trust of the bank are strongly interrelated. The results support the hypothesis that satisfaction is manifested in various expressions that are mutually related but not sharply delineated. Hypothesis 2 was: the satisfaction items constitute a scale according to the MH model. The hypothesis was supported by the results of the measurement analyses (see Section 3), which demonstrated that the items constituted a strong MH model scale in the whole sample as well as all subgroups investigated. Hypothesis 3 was: the satisfaction measure is positively correlated to other measures of satisfaction. The hypothesis was tested by means of correlation analyses between the satisfaction measure and the ACSI. The correlation was significant (p < 0.001) in both the complete dataset and the reduced dataset (Table 21). Thus, the hypothesis was supported. Table 21: Product-Moment Correlations (r) Between Satisfaction and the ACSI
Complete Dataset (N = 1681) r 0.78* * = p <0.001
131

Reduced Dataset (N = 1650) r 0.79* 95%-interval for

95%-interval for

0 . 76 0 . 80

0 . 77 0 . 81

Concept-related irrelevant variance Following Oort (1996), the hypotheses regarding concept-related irrelevant variance were tested using restricted factor analysis. Restricted factor analysis is confirmatory factor analysis with particular restrictions on the loadings. In restricted factor analysis, a model is specified such that the indicators of the trait load on the factor reflecting the trait, and not on the factor reflecting the violator. Thus, the loadings of the indicators of the trait on the factor reflecting the violator are restricted to the value 0. The loadings of the indicators reflecting the the violator on the factor reflecting the trait are also restricted to the value 0. Then, the fit of the model is evaluated in order to determine the models tenability. Oort (1996, pp. 46-49) suggested to use the modification indices (MIs) or adjusted modification indices (AMIs; to be discussed later) to detect item bias (i.e., whether particular indicators reflecting the trait are biased with respect to a violator). The MI is a statistic which reveals how much the fit of the model will improve if the factor loading of an indicator I of trait T on violator V is set free to be estimated. The MI is approximately chi-squared distributed with one degree of freedom (Bollen, 1989, p. 299). If the MIs reveal that the fit of the model will improve significantly by allowing a particular indicator I to load on violator V, this means that indicator I is biased with respect to violator V, and that the measurement of trait T is contaminated with respect to violator V. If the MIs reveal that the fit of the model will not be improved significantly by allowing a particular indicator I to load on violator V, this means that none of the indicators I is biased with respect to violator V, and that the measurement of trait T is not contaminated with respect to violator V. A larger number of significance tests and a larger sample size increase the likelihood of finding significant MIs and of obtaining false positives. In order to reduce the risk of false positives, Oort (1996, p. 49) suggested to use AMIs to detect biased items. The AMI is a statistic, which reduces the power of the MI, and thus is useful for the detection of items that are substantially biased. The AMI is defined as: AMI = ((df 1) / (2 MI)) * MI, where 2 is the chi-squared value and df is the degrees of freedom under the null model (i.e., the restricted model). If the AMI exceeds a critical chi-squared value with one degree of freedom, such as the critical chi-squared value for the 5 percent level of significance (i.e, 2 = 3.84), the item is judged to be biased.

132

In this study, restricted factor analysis was performed using proc calis (SAS STAT). Thus, a model was specified in which the nine items reflecting satisfaction loaded on a factor reflecting satisfaction, and not on a factor reflecting the violator under investigation (Figure 13). The indicator of the violator loaded on the factor reflecting the violator and not on the factor reflecting satisfaction. Because only one indicator loaded on the factor reflecting the violator, no error term was specified for that indicator (Oort, 1996, p. 47). The AMIs were calculated by hand on the basis of the chi-squared value and the degrees of freedom under the null model, and the MIs of the nine items reflecting satisfaction. The fit of the model was evaluated on the basis of the goodness of fit index (GFI), the normed fit index (NFI), and the non-normed fit index (NNFI). As a rule of thumb, indices having a value of 0.90 or higher indicate an acceptable fit (e.g., Bollen, 1989, pp. 269-281). The analyses were performed on both the complete dataset (N = 1689) and the reduced dataset (N = 1650).

I1

I2

..

I9

I10

E1

E2

E9

Figure 13. Graphical display of the factor model with nine indicators of customer satisfaction and one indicator of the violator. Hypothesis 8 was: the satisfaction scores are not contaminated by trust. This hypothesis was tested by means of a restricted factor analysis model using the nine items reflecting satisfaction with the company, and the trust score. The factor model was specified such that the nine items reflecting satisfaction loaded on the factor reflecting satisfaction, and the trust score loaded on the factor reflecting the violator. The factor model fitted the data

133

well (i.e., GFI = 0.92; NFI = 0.93; NNFI = 0.92), and none of the AMIs was significant (Table 22; the complete dataset). Similar results were found in the reduced dataset (Table 22; the reduced dataset). Thus, none of the items reflecting satisfaction was significantly biased with respect to trust, and the hypothesis was supported.

Table 22: AMIs of the Satisfaction Items in the Complete Dataset and the Reduced Dataset
Complete dataset (N=1689) Item At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK * I have mixed feelings about BANK * BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK * Last year I had some problems with BANK * * = scored reversely AMI 0.02 0.66 0.18 0.11 0.29 0.82 0.07 0.02 1.42 p-value ns ns ns ns ns ns ns ns ns Reduced dataset (N=1650) AMI 0.00 0.67 0.45 0.02 0.65 1.52 0.05 0.10 1.42 p-value ns ns ns ns ns ns ns ns ns

Hypothesis 9 was: the satisfaction scores are not contaminated by quality. This hypothesis was tested by means of a restricted factor analysis model using the nine items reflecting satisfaction with the company and the quality scores (because the analyses using quality scores yielded similar results as the analyses using TNP (Section 3), we reported the results from the former analyses). The factor model was specified such that the nine items reflecting satisfaction loaded on the factor reflecting satisfaction, and the quality score loaded on the factor reflecting the violator. The factor model did not fit the data well (i.e., NNFI = 0.89, which is below the critical value of 0.90 for the NNFI), and the AMI of item Q4d (Last year I had some problems with BANK; Table 4) was significant (Table 23; the complete dataset). Similar results were found in the reduced dataset (Table 23; the reduced dataset). Thus, item Q4d was significantly biased with respect to quality, and the hypothesis was not supported. A restricted factor analysis without item Q4d (i.e., the factor model was specified such that the remaining eight items reflecting satisfaction loaded on the factor reflecting

134

satisfaction) yielded a good fit (i.e., GFI = 0.93, NFI = 0.93, and NNFI = 0.91; the complete dataset), and none of the AMIs was significant. Similar results were found in the reduced dataset. Thus, the contamination of satisfaction scores by quality was due to item Q4d only.

Table 23: AMIs of the Satisfaction Items in the Complete Dataset and the Reduced Dataset
Complete dataset (N=1689) Item At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK * I have mixed feelings about BANK * BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK * Last year I had some problems with BANK * * = scored reversely AMI 1.74 0.59 0.02 0.46 0.01 0.04 0.00 0.14 6.52 p-value ns ns ns ns ns ns ns ns <0.05 Reduced dataset (N=1650) AMI 1.81 0.55 0.00 0.45 0.01 0.00 0.00 0.18 6.87 p-value ns ns ns ns ns ns ns ns <0.01

Hypothesis 10 was: the satisfaction scores are not contaminated by loyalty. This hypothesis was tested by means of a restricted factor analysis model using the nine items reflecting satisfaction with the company, and the loyalty score. The factor model was specified such that the nine items reflecting satisfaction loaded on the factor reflecting satisfaction, and the loyalty score loaded on the factor reflecting the violator. The factor model did not fit the data well (i.e., NNFI = 0.88, which is below the critical value of 0.90 for the NNFI), and the AMI of item Q3a (At BANK I feel at home; Table 4) was significant (Table 24; the complete dataset). Similar results were found in the reduced dataset (Table 24; the reduced dataset). Thus, item Q3a was significantly biased with respect to customer loyalty, and the hypothesis was not supported. A restricted factor analysis without item Q3a (i.e., the factor model was specified such that the remaining eight items reflecting satisfaction loaded on the factor reflecting satisfaction) yielded a good fit (i.e., GFI = 0.93, NFI = 0.93, and NNFI = 0.91; the complete dataset), and none of the AMIs was significant. Similar results were found in the reduced dataset. Thus, the contamination of satisfaction scores by customer loyalty was due to item Q3a only.
135

Table 24: AMIs of the Satisfaction Items in the Complete Dataset and the Reduced Dataset
Complete dataset (N=1686) Item At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK * I have mixed feelings about BANK * BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK * Last year I had some problems with BANK * * = scored reversely AMI 10.73 0.00 0.00 0.34 0.18 0.00 1.16 0.04 2.84 p-value <0.01 ns ns ns ns ns ns ns ns Reduced dataset (N=1650) AMI 12.00 0.01 0.04 0.42 0.24 0.01 1.17 0.02 3.49 p-value <0.01 ns ns ns ns ns ns ns ns

Hypothesis 11 was: the satisfaction scores are not contaminated by current customer profitability. This hypothesis was tested by means of a restricted factor analysis model using the nine items reflecting satisfaction with the company, and TCP2005 (i.e., the logarithmic transformation of CP2005; Section 2). The factor model was specified such that the nine items reflecting satisfaction loaded on the factor reflecting satisfaction, and TCP2005 loaded on the factor reflecting the violator. The factor model fitted the data well (i.e., GFI = 0.92, NFI = 0.92, and NNFI = 0.90), and none of the AMIs was significant (Table 25; the complete dataset). Similar results were found in the reduced dataset (Table 25; the reduced dataset). Thus, none of the items reflecting satisfaction was significantly biased with respect TCP2005, and the hypothesis was supported.

136

Table 25: AMIs of the Satisfaction Items in the Complete Dataset and the Reduced Dataset
Complete dataset (N=1689) Item At BANK I feel at home I am satisfied with BANK There are good reasons to leave BANK * I have mixed feelings about BANK * BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK * Last year I had some problems with BANK * * = scored reversely AMI 0.71 0.01 0.13 0.03 0.00 0.30 0.00 0.01 1.07 p-value ns ns ns ns ns ns ns ns ns Reduced dataset (N=1650) AMI 0.87 0.02 0.12 0.01 0.00 0.33 0.00 0.02 1.09 p-value ns ns ns ns ns ns ns ns ns

Method-related irrelevant variance Hypothesis 12 was: the satisfaction scores are not affected by the location of the satisfaction items in the questionnaire. The hypothesis was tested by means of a t-test of the difference between the average satisfaction score in the versions 1 and 2 of the pilot study, and the average satisfaction score in the versions 3 and 4 of the pilot study (see Table 8 in Chapter 5; note that the satisfaction score is the total score on the 9-item satisfaction scale). Because the difference was not significant (Table 26), the hypothesis was supported.

Table 26: Differences of Satisfaction Scores in Groups of the Pilot Study


Groups Hypothesis 12 Hypothesis 13 Difference -0.70 -0.46 t-statistic -1.19 -0.79 p-value ns ns

Hypothesis 13 was: the satisfaction scores are not affected by the presentation of the response categories of the satisfaction items. The hypothesis was tested by means of a t-test of the difference between the average satisfaction score in the versions 1 and 3 of the pilot study, and the average satisfaction score in the versions 2 and 4 of the pilot study (see Table 8 in Chapter 5; note that the satisfaction score is the total score on the 9-item satisfaction scale). Because the difference was not significant (Table 26), the hypothesis was supported.

137

Hypothesis 14 was: the satisfaction scores are not affected by the midpoint response style. The test of this hypothesis required the measurement of general midpoint responding (e.g., Baumgartner & Steenkamp, 2001, 2006). Because the present study did not yield suitable data to create a suitable measure of general midpoint responding, the hypothesis was tested in the second empirical study (Chapter 8). Hypothesis 15 was: the satisfaction scores are not affected by the extreme response style. The test of this hypothesis required the measurement of general extreme responding (e.g., Baumgartner & Steenkamp, 2001, 2006). Because the present study did not yield suitable data to create a suitable measure of general extreme responding, the hypothesis was tested in the second empirical study (Chapter 8).

Implicit construct representation The hypotheses regarding implicit construct representation were tested last, because the results of the tests of other hypotheses were used in the tests of the hypotheses regarding implicit construct representation. First, the test of hypothesis 9 demonstrated that item Q4d (Last year I had some problems with BANK; Table 4) was biased with respect to quality. The use of this item in the satisfaction scale was expected to inflate the correlation between customer satisfaction and quality. Therefore, we decided to exclude the item from the satisfaction scale when testing the hypothesis regarding the relation between customer satisfaction and quality. Second, the test of hypothesis 10 demonstrated that item Q3a (At BANK I feel at home; Table 4) was biased with respect to customer loyalty. The use of this item in the satisfaction scale was also expected to inflate the correlation between customer satisfaction and customer loyalty. Therefore, we decided to exclude this item from the satisfaction scale when testing the hypothesis regarding the relation between customer satisfaction and customer loyalty. The hypotheses concerning the relation of satisfaction scores to trust scores, quality scores, and loyalty scores, respectively, were tested by means of correlation analyses. Hypothesis 4 was: satisfaction scores are positively related to trust scores. This hypothesis was tested using the total score on the customer satisfaction scale and the total score on the trust scale. The product-moment correlation between customer satisfaction and trust was positive and significant (p < 0.001) in both the complete dataset and the reduced dataset (see Table 27). Thus, the hypothesis was supported.

138

Hypothesis 5 was: satisfaction scores are positively related to quality scores. In order to test this hypothesis, item Q4d (Table 4) was excluded from the customer satisfaction scale. Thus, the hypothesis was tested using the total score on the 8-item customer satisfaction scale and the quality scores (because the analyses using quality scores yielded similar results as the analyses using TNP (Section 3), except that the correlations in the former analyses were positive and the correlations in the latter analyses were negative, we reported the results from the former analyses). The product-moment correlation between customer satisfaction and quality was positive and significant (p < 0.001) in both the complete dataset and the reduced dataset (Table 27). This means that the fewer problems a participant has had with BANK, the higher his or her satisfaction with BANK was. Thus, the hypothesis was supported. Because it may also be interesting to examine the relations between the experience of singular problems and customer satisfaction, these relations were also reported (Table 28). These relations were negative because the experience of a problem is counter-indicative of quality. Hypothesis 6 was: satisfaction scores are positively related to loyalty scores. In order to test this hypothesis, item Q3a (Table 4) was excluded from the customer satisfaction scale. Thus, the hypothesis was tested using the total score on the 8-item customer satisfaction scale and the total score on the customer loyalty scale. The product-moment correlation between customer satisfaction and customer loyalty was positive and significant (p < 0.001) in both the complete dataset and the reduced dataset (see Table 27). Thus, the hypothesis was supported.

Table 27: Product-Moment Correlations Between Customer Satisfaction and Other Concepts
Complete dataset (N = 1689) r Trust Quality Loyalty * = p < 0.001 0.78* 0.47* 0.51* 95%-interval for Reduced dataset (N = 1650) r 0.79* 0.48* 0.51* 95%-interval for

0 . 76 0 . 80 0 . 43 0 . 51 0 . 47 0 . 55

0 . 77 0 . 81 0 . 44 0 . 52 0 . 47 0 . 55

139

Table 28: Relations Between the Incidence of Singular Problems and Customer Satisfaction
Complete dataset (N=1689) Item Errors in the execution of your banking affairs Errors in the execution of your orders Insufficient information on your banking affairs Unfair costs of banking services Slow service Slow money transfers Not keeping an appointment Insufficient accessibility by telephone Insufficient accessibility by internet Insufficient accessibility of offices Insufficient response to questions Problems with debit cards Problems with cash withdrawels Problems with internet banking Another problem * = not significant at p <0.05 Proportion 0.03 0.05 0.04 0.12 0.06 0.16 0.03 0.05 0.12 0.09 0.06 0.07 0.04 0.14 0.08 Polychoric correlation -0.33 -0.28 -0.44 -0.38 -0.40 -0.43 -0.32 -0.33 -0.24 -0.24 -0.18 -0.47 -0.20 -0.09* -0.21 -0.29 0.03 0.05 0.05 0.06 0.12 0.06 0.16 0.03 0.05 0.12 0.09 0.06 0.07 0.04 0.14 0.08 Reduced dataset (N=1650) Proportion Polychoric correlation -0.33 -0.27 -0.45 -0.38 -0.40 -0.45 -0.33 -0.32 -0.24 -0.24 -0.18 -0.47 -0.21 -0.10* -0.22 -0.28

Ambiguous information on your banking affairs 0.06

Hypothesis 7 was: satisfaction scores are positively related to future customer profitability. In Chapter 3, the following model was suggested for the relation between customer satisfaction (denoted CSt=0), other independent variables (denoted Xi), and future customer profitability (denoted CPt>0):
CPt >0 = + CS t =0 + ... + i X i + .

Because customer satisfaction was measured in September 2005, CSt=0 was operationalised as customer satisfaction in September 2005 (denoted CS2005). Because it was expected that the influence of customer satisfaction on CP manifested after one year (Section 5 from Chapter 3), CPt>0 was operationalised as CP in September 2006 (denoted CP2006). Furthermore,

140

because former studies indicated that current CP accounts for the largest part of future CP (Section 5 from Chapter 3), Xi was operationalised as CP in September 2005 (denoted CP2005). The preliminary analyses demonstrated that the distributions of CP2005 and CP2006 were positively skewed and included many outliers in the skew tail (Section 2). Therefore, CP2005 and CP2006 were logarithmically transformed. The logarithmic transformation of CP2005 was denoted TCP2005 and the logarithmic transformation of CP2006 was denoted TCP2006 (Section 2). Hypothesis 7 was tested by means of a regression analysis of TCP2006 on TCP2005 and CS2005. TCP2006 is the predicted value of TCP2006, a is the intercept, b1 is the effect of TCP2005 on TCP2006, and b2 is the effect of TCS2005 on TCP2006. The regression model was: TCP2006 = a + b1TCP2005 + b2CS2005. (Model 1)

We used hierarchical regression analyses (e.g., Cohen & Cohen, 1983, pp. 120-122; Hays, 1988, pp. 662-665; Tabachnick & Fidell, 2007, pp. 138-147) to compute the contribution of each predictor to the explanation of TCP2006. Because we expected that TCP2005 accounted for the largest part of TCP2006, TCP2005 was entered first in the analyses and CS2005 was entered second. Statistic Fseq expresses the significance of sequential entries of predictor variables for the explanation of the criterion variable. Let RM denote the restricted model without the predictor variable of interest, ERM the error sum of squares under the restricted model, dfRM the degrees of freedom under the restricted model, FM the full model including the predictor, EFM the error sum of squares under the full model, and dfFM the degrees op freedom under the full model. Then statistic Fseq is defined as (Maxwell & Delaney, 1990, pp. 73-74):
( E RM E FM ) /(df RM df FM ) . E FM / df FM

Fseq =

which theoretically follows an F distribution with dfRM dfFM and dfFM degrees of freedom. Following Cohen and Cohen (1983, p.155), we also computed the effect size (denoted f2) for sequential entries of predictor variables. Let R 2 RM be the variance explained under the restricted model and R 2 FM the variance explained under the full model. Then effect size f2 is defined as:
R 2 FM R 2 RM . f = 1 R 2 FM
2

141

The regression analyses were done using proc reg (SAS STAT). To assess the robustness of the results, the full model was tested with and without weighting of participants (Section 2), and with and without outliers (i.e., the complete dataset and the reduced dataset, respectively). Thus, four regression analyses were done; the first analysis was in the complete dataset without weighting of participants, the second analysis in the complete dataset with weighting of participants, the third analysis in the reduced dataset without weighting of participants, and the fourth analysis in the reduced dataset with weighting of participants. Seven participants were excluded from the analyses because they had deceased since September 2005. The results from the regression analyses are presented in Table 29. The major statistics reported are R2, which represents the cumulative proportion of the variance explained after including a new predictor in the analysis; f2, which represents the effect size of each new predictor entered in the analysis; Fseq, which represents the significance of each new predictor for the explanation of CP2006; and SRW, which represents the standardised regression weight (e.g., Hays, 1988, pp. 623-625) of each predictor. Because we reported the standardised solution, intercept a was equal to zero and not reported in Table 29. Each analysis demonstrated a significant contribution of CS2005 to the explanation of TCP2006, when TCP2005 was accounted for (Fseq in Table 29). Furthermore, each analysis yielded a positive effect for CS2005 on TCP2006 (SRW in Table 29). The similarity of the results from the analyses demonstrates their robustness. Thus, hypothesis 7 was supported by the results of the analyses. The percentage explained variance of TCP2006 was 84% or more (R2 in Table 29) across analyses. This result were almost completely due to TCP2005, which also had large effect size (f2 in Table 29) in each analysis. Thus, current TCP was the main predictor of future TCP. This result is in line with the results from former customer profitability analyses in the financial services industry (e.g., Campbell & Frei, 2004; Terpstra, 2005, 2006b).

142

Table 29: Results From Hierarchical Regression Analyses Estimating Model 1 (Standardised Solution)
Reduced dataset (N = 1644) Unweighted dataset Fseq Weighted dataset Fseq f2 f2 F2 5.5189 0.0105 SSE (dfE) 1394 (2) t-value 101.08() 0.0092 2.91( )
2

Complete dataset (N = 1682) Weighted dataset Fseq Fseq

Unweighted dataset

ES

R2
0.8464 0.8482 SSM (dfM)
3

f2

R2
5.5104 0.0119 SSE (dfE) 255 (1679) SE 0.0096 0.0096 4.51( )
3

R2
0.8619 0.8626 SSM (dfM) (dfE) 226 (1641) SE 0.0092 5150 ( )
3

R2
0.8466 0.8482 SSM (dfM)

statistics 9255 (3) 6.2411 0.0051 SSE F-value 8.46 (2) 20.37 (3) F-value 4691 ( ) (2) t-value SRW 0.9267 0.0267 95.65()
3

TCP2005

0.8612

6.2046

10422 (3)

10246 (3)

9061 (3) 17.39 (3) F-value 249 (1641) 4584 (3)

CS2005

0.8620

0.0060

10.20 (2)

MF 1426 (2) SRW 0.9152 0.0432 1417

SSM

SSE

F-value

statistics

(dfM)

(dfE)

Full

1449

232

5245 ( )

model

(2)

(1679)

PV

SRW

SE

t-value

SRW 0.9155 0.0404

SE 0.0097 0.0097

t-value 94.55() 4.17(3)

143
2

statistics

TCP2005

0.9262

0.0091

101.99()

CS2005

0.0290

0.0091

3.19( )

The criterion variable was transformed customer profitability in September 2006. ES statistics is effect size statistics; R 2 is proportion of variance explained after including the predictor; f2 is effect size; Fseq is sequential F-value; TCP2005 is transformed customer profitability in September 2005; CS2005 is customer satisfaction in September 2005; MF statistics is model fit statistics; SSM is model sum of squares; dfM is degrees of freedom used for estimating the model; SSE is error sum of squares; dfE is degrees of freedom left after estimating the model; full model is the model including all predictor variables; PV statistics is predictor variable statistics; SRW is standardised regression weight; SE is standard error of regression weight; () = significant at p<0.05; () = significant at p<0.01; () = significant at p<0.001.

Relation between customer satisfaction and future CP with a time-lag of two years

The test of hypothesis 7 demonstrated that customer satisfaction was positively related to future CP. It is unknown how a time lag larger than one year between measurements of customer satisfaction and future CP affects the relation between customer satisfaction and future CP. This warrants further research into the relation between customer satisfaction and future CP. We investigated the relationship of customer satisfaction and future CP on available data pertaining to a two-year time-lag. Method Because CP2005 and CP2007 were skewed and included many outliers, we applied a logarithmic transformation to CP2005 and CP2007 (Section 2). The logarithmically transformed CP2005 was denoted TCP2005 and the logarithmically transformed CP2007 was denoted TCP2007 (Section 2). We regressed TCP2007 on TCP2005 and CS2005. TCP2007 is the predicted value of TCP2007, a is the intercept, b1 is the effect of TCP2005 on TCP2007, and b2 is the effect of CS2005 on TCP2007. The regression model was: TCP2007 = a + b1TCP2005 + b2CS2005. (Model 2)

We used hierarchical regression analyses (e.g., Cohen & Cohen, 1983, pp. 120-122; Hays, 1988, pp. 662-665; Tabachnick & Fidell, 2007, pp. 138-147) to compute the contribution of each predictor to the explanation of TCP2007. Because we expected that TCP2005 accounted for the largest part of TCP2007, TCP2005 was entered first in the analyses and CS2005 was entered second. In order to explore the robustness of the results, we estimated the model with and without weighting of participants, and with and without outliers. Thus, we did four regression analyses. Results The results are reported in Table 30. Because we reported the standardised solutions, intercept a was equal to zero and not reported in Table 30. Each analysis demonstrated a significant contribution of CS2005 to the explanation of TCP2007, when TCP2005 was accounted for. Furthermore, each analysis yielded a positive effect for CS2005. The similarity of the results from the analyses demonstrates their robustness. Thus, there is evidence of a relation between customer satisfaction and future TCP, when future TCP is measured with a time lag of two years.
144

Table 30: Results From Hierarchical Regression Analyses Estimating Model 2 (Standardised Solution)
Reduced dataset (N = 1644) Unweighted dataset Fseq Weighted dataset Fseq f2 f2 f2 1.7330 0.0075 SSE (dfE) 1046 (2) t-value 54.47 (3) 0.0147 0.0147 3.38 (3) SRW 0.7902 0.0526 597 (1641) SE 0.0150 0.0150 t-value 52.77 (3) 3.51 (3) 1438 (3) Fseq 2845 (3) 12.35 (3) F-value Weighted dataset Fseq

Complete dataset (N = 1682)

Unweighted dataset

ES

R2
0.6351 0.6382 SSM (dfM)
3

f2

R2
1.7405 0.0086 SSE (dfE) 608 (1679) SE 0.0147 0.0147 3.81 (3) 0.0496 53.53 (3) 0.8000 t-value SRW SE (2) (1641) 1481 ( )
3

R2
0.6449 0.6474 SSM (dfM) 1064 579 1506 ( )
3

R2
0.6341 0.6368 SSM (dfM)

statistics 2924 (3) 1.8161 0.0069 SSE (dfE) F-value 11.42 (3) 14.51 (3) F-value 2982 (3)

TCP2005

0.6456

1.8217

3060 (3)

CS2005

0.6483

0.0077

13.00 (3)

MF 1073 (2) SRW 0.7907 0.0563

SSM

SSE

F-value

statistics

(dfM)

(dfE)

Full

1090

591

1548 ( )

model

(2)

(1679)

PV

SRW

SE

t-value

145

statistics

TCP2005

0.8003

0.0145

55.20 (3)

CS2005

0.0523

0.0145

3.61(3)

The criterion variable was transformed customer profitability in September 2007. ES statistics is effect size statistics; R 2 is proportion of variance explained after including the predictor; f2 is effect size; Fseq is sequential F-value; TCP2005 is transformed customer profitability in September 2005; CS2005 is customer satisfaction in September 2005; MF statistics is model fit statistics; SSM is model sum of squares; dfM is degrees of freedom used for estimating the model; SSE is error sum of squares; dfE is degrees of freedom left after estimating the model; full model is the model including all predictor variables; PV statistics is predictor variable statistics; SRW is standardised regression weight; SE is standard error of standardised regression weight; () = significant at p<0.05; () = significant at p<0.01; () = significant at p<0.001.

The computation of the predicted values for TCP2007 on the basis of the unstandardised solutions (not shown here), and the exponential transformation of the predicted values for TCP2007 to predicted values for CP2007, demonstrated that the impact of CS2005 on the predicted value for CP2007 was dependent on the value for TCP2005. For customers having a small value for TCP2005, the score for CS2005 had almost no impact on the predicted value for CP2007, while for customers having a large value for TCP2005, the score for CS2005 had a substantial impact on the predicted value for CP2007. This result may be due to using logarithmically transformed values for CP2005 and CP2007 in the regression analyses, but we consider it a plausible result which is is in agreement with the opinion in marketing that it is important to keep profitable customers satisfied.
6 Discussion

The first empirical study demonstrated that the set of nine items reflecting customer satisfaction constituted a strong ( H .5 ) scale according to the MH-model. Furthermore, the study demonstrated several strengths and weaknesses of the measurement instrument for customer satisfaction and the corresponding scale scores. A first strength is the explicit and implicit definitions of customer satisfaction underlying the measurement instrument. All aspects of customer satisfaction were evenly represented in the instrument, and this supports the claim that the scale scores cover the meaning of customer satisfaction well. A second strength is the fit of the measurement model. The tests of the model yielded no substantial violations of the MH model, which supports the use of the scale scores to measure customer satisfaction. Because the measurement instrument was composed of items that were indicative of customer satisfaction and items that were counter-indicative of the construct, the fit of the measurement model also confirms the conception of customer satisfaction as the opposite of customer dissatisfaction on a bipolar dimension. A third strength is the fit of the measurement model in the subgroups based on customer segment, age, and gender. This supports the generalisability of the scale across subgroups in the target population. A fourth strength is that the inclusion of items that are indicative and items that are counter-indicative of customer satisfaction in the measurement instrument seems to limit the effects of aquiescent responding on the scale scores (e.g., Baumgartner & Steenkamp, 2001, 2006, Van Herk, 2000, p.55). A fifth strength of the scale is that the scale is composed of a large number of items, which limited the effect of a biased item on the scale score. Lack of bias also supports the confidence in the validity of the scale-score interpretations.

146

The major weakness of the scale scores was their divergent validity. The tests of the hypotheses regarding concept-related irrelevant variance revealed that the customer satisfaction scores were contaminated by quality and customer loyalty. This was due to the items Q3a (At BANK I feel at home; Table 4) and Q4d (Last year I had some problems with BANK; Table 4). For this reason, the scale had to be modified for research into the connections of customer satisfaction with these constructs. A point of concern were the outliers in the left-skew tail of the distribution of the customer satisfaction scores. It cannot be ruled out that the outliers were due to stylistic responding. The analyses into relations between customer satisfaction and future CP with a time lag of two years yielded some important results. It was demonstrated that the influence of customer satisfaction on customer profitability lasts for at least two years. This warrants further research into the effect of customer satisfaction on the cumulated customer profitability. Furthermore, a comparison of the results of the analyses predicting future CP with a time lag of one year (Table 29) and the analyses predicting future CP with a time lag of two years (Table 30) reveals that the influence of current CP on future CP decreases when the time lag between the measurements of current CP and future CP increases. The decaying implies that, in the long run, companies cannot take the future CP of existing customers for granted. It also implies that it may be dangerous to estimate customer lifetime value by solely using current CP. Based on this research, not only current CP should be used for the estimation of customer lifetime value, but for example also customer satisfaction and customer loyalty. Six additional remarks are in order. First, the items indicative of customer satisfaction were all negatively skewed, and the items counter-indicative of customer satisfaction were all positively skewed. This is in agreement with the results found in various satisfaction studies in various domains (e.g., Oliver, 1997; Peterson & Welson, 1992), and suggests that being satisfied is more or less the default satisfaction state of most persons. Second, the correlation between customer satisfaction and trust was found to be very high, and matched the correlation between customer satisfaction and the score on the ACSI. This indicates that there is a large overlap between the construct of customer satisfaction and the construct of trust in the context of retail banking. Third, current customer profitability had a large effect on future customer profitability. Therefore we recommend including current customer profitability as a predictor in regression models of future customer profitability in the financial services industry (see also Donkers, Verhoef, & De Jong, 2007). Fourth, the results of the analyses in the complete dataset and the reduced dataset were nearly similar. Thus, the outliers on the
147

items reflecting customer satisfaction, trust, customer loyalty, and interest did not influence the results of the data analyses substantially. Fifth, the effect sizes for customer satisfaction on future CP were small. This may be due to the omission of important predictors, such as the total financial means of a customer (Chapter 3), in the regression analyses (e.g., Hays, 1988, p. 655). Therefore we suggest including measurements of the total financial means of customers in future research into the influence of customer satisfaction on future CP. Sixth, the generalisability of the results of the study into the relation between customer satisfaction and future CP has to be investigated. The sample was drawn from the research panel of the company, and it cannot be ruled out that persons who were willing to participate in the panel have a different attitude towards banking than persons who were not willing to participate in the panel, and that the attitude towards banking influences the relation between customer satisfaction and future CP. Therefore, we advocate research into the generalisability of the results of the present study to other groups and companies within the financial service industry.
7 Conclusion

So far, the results of the first empirical study yielded much evidence for construct validity, meaning that the results warrant the interpretation of the scale scores in terms of satisfaction with the company. However, the validation study was not completed because two hypotheses regarding the contamination of scale scores by method related irrelevant variance were not tested. These hypotheses were tested in the second empirical study (Chapter 8). Because the test of these hypotheses yielded further information about the meaning of the scale scores in the first empirical study, we prefer to present the final conclusions about the validity of measurement after the presentation of the results of the second empirical study.

148

149

150

Chapter 7 Method of the second empirical study into customer satisfaction with BANK

Introduction

The purpose of the second empirical study into customer satisfaction with BANK was to test hypothesis 14 (i.e., the satisfaction scores are not affected by the midpoint response style) and hypothesis 15 (i.e., the satisfaction scores are not affected by the extreme response style). Testing these hypotheses required the measurement of (a) customer satisfaction, (b) general midpoint responding, and (c) general extreme responding. We decided to operationalise customer satisfaction on the basis of the 9-item measurement instrument (see Chapter 5), because it was our purpose to combine the conclusions of the second empirical study with those of the first empirical study. Furthermore, we decided to operationalise general midpoint responding as a participants proportion of responses in the middle response category of rating scales of items, and general extreme responding as a participants proportion of responses in the extreme response categories of rating scales of items (Chapter 8). Greenleaf (1992b), Van Herk (2000), and Baumgartner and Steenkamp (2001, 2006) noted that measures of general midpoint responding and general extreme responding have to be based on persons responses to many items with low inter-item correlations. This is in agreement with Paulhus (1991, p. 49) remark that that persons exhibiting consistent extreme response behaviour across time and stimuli may be said to have an extreme response style. For this reason, Greenleaf (1992b) and Van Herk (2000) operationalised extreme response style as a participants proportion of responses in the extreme response categories of rating scales of various items. Generalising Paulhus (1991, p. 49) remark to midpoint responding, persons exhibiting a consistent midpoint response behaviour across time and stimuli may be said to have a midpoint response style. The midpoint response style may be operationalised as a participants proportion of responses in the middle response category of rating scales of various items. Dependence of the operationalisations of response styles on operationalisations of the construct of interest would complicate research into the contamination of measurements of the

151

construct of interest by response styles (Oort, 1996, pp. 13-14). For example, assume that the measurement of general extreme responding was done on the basis of items reflecting the construct of interest. Then a high score on general extreme responding can be achieved by answering positively to the items indicative and negatively to the items counter-indicative of the construct of interest. In that instance, a high measurement value for general extreme responding might reflect a high preference for extreme responding as well as a high value on the construct of interest, and these two possibilities cannot be distinguished. To prevent that measurements of general midpoint responding and general extreme responding partly reflect customer satisfaction, the items used for the former measurements had to be unrelated to customer satisfaction. For this reason, we decided to measure four constructs, which we expected to be unrelated to customer satisfaction, and to use the items reflecting these constructs to compose the measures for stylistic responding. The constructs were (a) expectations with respect to personal spending power, (b) expectations with respect to the Dutch economy, (c) involvement with banking matters, and (d) understanding of the Dutch banking market. Because the response format of items may affect stylistic responding (Van Herk, 2000, p. 59), we used identical response formats for all items used in the study. The second empirical study was conducted in August 2007, which was approximately two years after the first empirical study. This chapter discusses the method used in the second empirical study. It encompasses an outline of the operationalisations of the constructs, the questionnaire, the target population, the sample, the procedure, and the data.

Operationalisations

The design of the questionnaire, the format of the items, and the wording of the items were based upon general principles concerning survey research as formulated by Sudman and Bradburn (1982), Sheatsley (1983), Belson (1986), and Dillman et al. (1998). All items used were 5-point rating scale items. Similar to the first empirical study, we included a no answer option in the response options of the items, and we varied the ordering of items within the groups of items reflecting a construct, across different administrations of the questionnaire. The operationalisations of the five constructs were the following. Customer satisfaction Customer satisfaction was operationalised by means of nine Likert items with five ordered response categories each, ranging from totally agree (which was scored 4) to totally disagree

152

(which was scored 0) (Chapter 5; Table 1). Also in the sample used in the second study, we expected the nine items to constitute a scale according to the MH model. Expectations with respect to personal spending power The customers positive expectations with respect to personal spending power (EPSP) were measured using two items reflecting this concept (Table 1). Each item had five ordered response categories that ranged from totally agree (which was scored 4) to totally disagree (which was scored 0). We expected the two items to be negatively correlated.

Table 1: Items Reflecting Expectations Regarding Personal Spending Power


Code Q6a Q6d* Item I expect that my spending power will increase next year In five years my spending power will be lower than today Score range 0-4 0-4

*= item is counter-indicative of the concept

Expectations with respect to the Dutch economy The customers positive expectations with respect to the Dutch economy (EDE) were measured using two items reflecting this concept (Table 2). Each item had five ordered response categories that ranged from totally agree (which was scored 4) to totally disagree (which was scored 0). We expected the two items to be negatively correlated.

Table 2: Items Reflecting Expectations Regarding the Dutch Economy


Code Q7b* Q7c Item I expect that the Dutch economy will decrease next year In five years, the Dutch economy will be better than today Score range 0-4 0-4

*= item is counter-indicative of the concept

Involvement with banking matters The customers involvement with banking matters (labeled involvement) was measured using four items reflecting this concept (Table 3). Each item had five ordered response categories that ranged from totally agree (which was scored 4) to totally disagree (which was scored 0). We expected the four items to be positively correlated after having been scored in the same direction.

153

Table 3: Items Reflecting Involvement With Banking Matters


Code Q8b Q8c Q8d* Q8e* Item I find banking matters very important Arranging banking matters properly makes life easier I find banking matters boring Banking matters leave me cold Score range 0-4 0-4 0-4 0-4

*= item is counter-indicative of the concept

Understanding of the Dutch banking market The customers understanding of the Dutch banking market (labeled understanding) was measured using four items reflecting this concept (Table 4). Each item had five ordered response categories that ranged from totally agree (which was scored as 4) to totally disagree (which was scored as 0). We expected the four items to be positively correlated after the correct scoring. Table 4: Items Reflecting Understanding of the Dutch Banking Market
Code Q9a Q9b* Q9c* Q9d Item I know the pros and cons of the retail banks in the Netherlands I find it difficult to judge the quality of BANK I find it difficult to compare the quality of retail banks I know exactly what I may expect from BANK Score range 0-4 0-4 0-4 0-4

*= item is counter-indicative of the concept

The questionnaire

The questionnaire (Appendix 3; in Dutch) was composed of the items reflecting customer satisfaction, EPSP, EDE, involvement, and understanding. In addition, some items were included in the questionnaire for business purposes, and some other items were included to optimise the design of the questionnaire. For example, several items regarding product possession and contacts with the company were included in order to elicit the participants memories of the company, before the measurement of satisfaction with the company started. The questionnaire was pre-tested in a small sample (N = 3). The pre-tests demonstrated that it took a participant approximately 15 minutes to complete the questionnaire, which we considered acceptable.

154

Procedure

The survey was administered via the Internet to the members of the companys research panel. The comparability of the target population (i.e., mature retail customers of a Dutch bank), the research panel, and the final sample is discussed shortly. Panel members were invited by E-mail to participate in the survey. The questionnaire was made available at a site of the market research agency that managed the survey. The questionnaire was accessible from 24 August 2007 until 3 September 2007. The persons had access to the site on the basis of a password and were identified on the basis of a customer-id. After a person completed the questionnaire, the data were uploaded to the agency. The participants received a small incentive (i.e., saving points valued 10 euro). This is the common fee that the company paid to panel members that responded to a survey of medium length.

Data

The research agency yielded a file containing the raw data, which were the coded responses of the participants to the survey items (note that a no answer response was scored as a missing value). In order to enrich the raw data, the file was merged with the marketing database. The merging was executed on the basis of customer-id, and it was successful for all participants. Subsequently, three variables were added to the file, (a) customer segment ultimo September 2007, (b) gender, and (c) age ultimo September 2007.

Target population, panel, and sample

Similar to the first empirical study, the target population consisted of the mature retail customers of a Dutch bank. The participants were registered by the company as the primary owner of at least one banking product provided by the company. A total of 2972 persons were invited to participate in the survey. They were mature retail customers who, in August 2007, participated in the companys research panel. The panel members had agreed to participate in marketing research via the Internet. The agreement encompassed that (a) the company is free to approach the person for marketing research, (b) the person is free to participate in the research or to decline, (c) the company is allowed to use the survey data for research purposes only, and (d) the company is not allowed to distribute any personalised data to third parties. All panel members could be approached by E-mail, and had a unique customer-id that was used for identification purposes.

155

The research panel differed in three ways from the target population. First, because the companys most valuable customers were overrepresented in the research panel, the panel differed significantly (2(2) = 1244, p < 0.001) from the target population with respect to the distribution of customer segment (Table 5). Second, the panel differed significantly (2(2) = 212, p < 0.001) from the target population with respect to the distribution of gender. Males were overrepresented in the panel (see Table 5). This was partly due to the overrepresentation of males among the segment Top Customers (i.e., the segment that was overrepresented in the research panel), and partly to unknown causes. Third, the panel differed significantly (2(2) = 191, p < 0.001) from the target population with respect to the distribution of age group (Table 5). The response rate in the study was approximately 41% (N = 1227). Table 5 shows the distributions of customer segment, gender, and age group within the company, the panel, and the research sample. The research sample differed significantly from the target population with respect to customer segment (2(2) = 710, p < 0.001), gender (2(2) = 144, p < 0.001), and age group (2(2) = 110, p < 0.001). Furthermore, the research sample differed significantly from the remainder of the panel with respect to customer segment (2(2) = 30, p < 0.001), gender (2(2) = 14, p < 0.001), and age group (2(2) = 22, p < 0.001). Thus, respondents differed significantly from non-respondents with respect to customer segment, gender, and age group. This was in line with the first empirical study (see Chapter 5). Table 5: Distribution (Percentages) of Customer Segment, Gender and Age Group in the Study
Company Customer segment Top Standard Development Gender Female Male Unknown Age group 18 to 39 years 40 to 59 years 60 years and older 34 37 29 26 49 25 22 50 28 44 52 4 33 65 2 29 69 2 34 41 25 64 25 11 70 22 8 Panel Sample

156

157

158

Chapter 8 Results of the second empirical study into customer satisfaction with BANK

Introduction

This chapter presents and discusses the results of the second empirical study in which hypothesis 14 (i.e., the customer satisfaction scores are not affected by the midpoint response style) and hypothesis 15 (i.e., the customer satisfaction scores are not affected by the extreme response style) were investigated. First, we discuss preliminary analyses of which the purpose was to prepare the data for the subsequent analyses. Second, we discuss measurement analyses, which aimed at checking whether the MH model fitted the items for customer satisfaction and at constructing scales for stylistic responding. Third, we discuss the results of the tests of hypotheses 14 and 15. Fourth, we discuss the generalisability of the results. Fifth, based on both empirical studies, we discuss the conclusions regarding the validity of measurement of customer satisfaction.

Preliminary data analyses

The dataset containing the raw data was converted into a SAS dataset, and the items that were assumed to be counter-indicative of the constructs (see the description of the measurement instruments in Chapter 7) were recoded in the opposite direction. Furthermore, we (a) examined the distribution characteristics of the variables in the dataset, (b) explored the data quality, (c) conducted missing data analyses, and (d) conducted outlier analyses. To examine the distribution characteristics of the variables, we computed the histograms and descriptive statistics of all variables in the dataset. For this purpose, proc univariate (SAS STAT) and proc means (SAS STAT) were used. The histograms (not shown here) demonstrated that all variables were single peaked, and that many were negatively skewed. This finding was corroborated by descriptive statistics (Table 1). The correlations between the items reflecting customer satisfaction with the retail bank and the items reflecting expectations regarding personal spending power (EPSP) were examined. For this purpose, proc corr (SAS STAT) was used. Following our expectations, (a) the items reflecting customer satisfaction were highly correlated, (b) the items reflecting EPSP were highly correlated, and (c) the items reflecting customer satisfaction and the items
159

reflecting EPSP were almost uncorrelated (Table 2). This result suggested that participants did not respond randomly but instead responded to the items content. Because it was required that the items reflecting EPSP, expectations regarding the Dutch economy (EDE), involvement with banking matters (involvement), and understanding of the Dutch banking market (understanding) were unrelated to customer satisfaction, the correlations between the items reflecting these constructs and the items reflecting customer satisfaction were computed. Table 3 shows that two items reflecting understanding (i.e., Q9b: I find it difficult to judge the quality of BANK and Q9d: I know exactly what I may expect from BANK; Table 1) correlated substantially with the items reflecting customer satisfaction. The other items reflecting understanding, and the items reflecting EPSP, EDE, and involvement were almost uncorrelated with the items reflecting customer satisfaction. This result strengthened our confidence in the usefulness of the data for the purpose of the second empirical study, which was the testing of hypotheses 14 and 15. The items reflecting customer satisfaction, EPSP, EDE, involvement, and understanding showed few missing data (i.e., 5% or less; see Table 1). Thus, following the strategy explained in Chapter 6, item scores were imputed by means of method TW-E (Bernaards & Sijtsma, 2000; Van Ginkel et al., 2007). As expected, the descriptive statistics of the items before imputation were almost identical to the descriptive statistics of the items after imputation. Some participants (N = 41) left more than 50 percent of the items reflecting customer satisfaction, EPSP, EDE, involvement, or understanding unanswered (Table 4). These participants were considered outliers, and we created indicator variables to mark them in the dataset. To detect multivariate outliers, the leverage statistic (see Chapter 6) was computed by means of a regression analysis using customer-id as the criterion variable, and 21 items reflecting customer satisfaction, EPSP, EDE, involvement, and understanding as the predictor variables (Tabachnick & Fidell, 2007, pp. 75-76, 111-112). The analysis yielded 60 participants with a significant (p < 0.001) leverage value. Visual inspection of the data revealed that these participants tended to give extremely positive or extremely negative responses to the items. Furthermore, the inspection demonstrated that the two participants with the highest leverage value had alternated extremely positive and extremely negative responses to different items having similar content. It was suspected that the two participants with the highest leverage value had responded inconsistently to the items. An indicator variable was created to mark them in the dataset. This variable was joined with the variables marking the participants who left the majority of items
160

reflecting a particular construct unanswered (Table 4). The union of these variables marked 43 outliers in the dataset. In agreement with the first empirical study, the results from analyses with outliers and analyses without outliers were examined. Henceforth, the dataset including these outliers is labeled the complete dataset, and the dataset without these outliers is labeled the reduced dataset.

Table 1: Descriptive Statistics of Items Reflecting Customer Satisfaction, EPSP, EDE, Involvement, and Understanding (Before Imputation; N = 1227)
Code Q3a Q3b Q3e* Q3g Q4a Q4b Q4c* Label Customer satisfaction items At BANK I feel at home I am satisfied with BANK I have mixed feelings about BANK BANK meets all my requirements for a bank Last year I had a pleasant relationship with BANK BANK has met my expectations I have regretted my choice for BANK EPSP items I expect that my spending power will increase next year 1 0 7 6 3 4 1 8 8 28 27 22 28 0 5 0 3 21 3 8 5 2.88 2.86 2.93 2.73 2.62 2.75 2.69 3.21 2.91 1.90 2.17 2.26 2.09 2.74 2.96 2.58 2.99 1.89 2.43 1.71 2.54 0.81 0.81 1.05 1.07 0.95 0.80 0.90 0.84 1.05 0.89 1.00 0.83 0.79 0.75 0.57 0.89 0.77 0.88 0.87 0.95 0.77 -0.82 -1.12 -0.85 -0.62 -0.85 -0.74 -1.02 -1.09 -0.89 -0.07 -0.14 -0.33 -0.16 -0.75 -1.04 -0.43 -0.79 0.01 -0.44 0.39 -0.75 Nmiss Mean SD Skewness

Q3d* There are good reasons to leave BANK

Q4d* Last year I had some problems with BANK Q6a

Q6d* In five years my spending power will be lower than today EDE items Q7b* I expect that the Dutch economy will decrease next year Q7c Q8b Q8c Q8e* Q9a Q9c* Q9d In five years, the Dutch economy will be better than today Involvement items I find banking matters very important Arranging banking matters properly makes life easier Banking matters leave me cold Understanding items I know the pros and cons of the retail banks in the Netherl. I find it difficult to compare the quality of retail banks I know exactly what I may expect from BANK

Q8d* I find banking matters boring

Q9b* I find it difficult to judge the quality of BANK

*= scored reversely

161

Table 2: Correlations Between 2 Items reflecting Customer Satisfaction and 2 Items Reflecting EPSP
Q3a At BANK I feel at home I am satisfied with BANK I expect that my spending power will increase next year In five years my spending power will be lower than today * = scored reversely Q3a Q3b Q6a Q6d* Q3b 0.74 Q6a -0.04 -0.05 Q6d* 0.03 0.02 0.62

Table 3: Correlations Between Items Reflecting Customer Satisfaction (Columns) and Items Reflecting Other Constructs (Rows)
Q3a Q6a Q6d Q7b Q7c Q8b Q8c Q8d Q8e Q9a Q9b Q9c Q9d -0.04 0.04 0.06 0.04 0.08 0.12 0.11 0.06 -0.08 0.24 -0.04 0.44 Q3b -0.05 0.02 0.05 0.03 0.01 0.08 0.05 -0.03 -0.09 0.15 -0.03 0.44 Q3d -0.07 0.00 0.08 0.03 0.03 0.09 0.06 0.02 -0.10 0.14 -0.03 0.35 Q3e -0.04 0.01 0.09 0.06 -0.01 0.06 0.09 0.02 -0.09 0.18 -0.01 0.40 Q3g -0.03 0.03 0.03 0.03 -0.02 0.10 0.01 -0.04 -0.13 0.14 -0.06 0.42 Q4a 0.03 0.08 0.07 0.04 0.04 0.11 0.07 0.04 -0.09 0.18 -0.06 0.43 Q4b -0.02 0.04 0.06 0.04 -0.02 0.08 0.04 -0.03 -0.12 0.15 -0.06 0.45 Q4c -0.03 0.04 0.05 0.02 0.04 0.13 0.08 0.05 -0.08 0.20 -0.02 0.40 Q4d 0.00 0.05 0.05 0.03 -0.03 0.03 0.04 0.00 -0.10 0.10 -0.04 0.35

For the legenda see Table 1

Table 4: Number of Participants Leaving More Than Half of the Items Unanswered
Customer satisfaction N 0 25 21 0 2 EPSP EDE Involvement Understanding

162

Mokken scale analysis of customer satisfaction

Customer satisfaction was operationalised using the measurement instrument presented in Chapter 5 (Chapter 5; Table 1). In the first empirical study (Chapter 6), it was demonstrated that the nine items constituted a scale according to the MH model. We hypothesised that the nine items also constituted a scale according to the MH model in the second empirical study. To test this hypothesis, Mokken scale analysis was done using MSPwin5.0 (Molenaar & Sijtsma, 2000). First, the dimensionality of the item set was investigated using the confirmatory strategy (Chapter 4). Second, the assumption of monotonicity was investigated (Chapter 4). Third, the scale-scores statistics were computed (Chapter 4). Fourth, the scalability of the item set within distinct customer segments, gender groups, and age groups (Chapter 7) was investigated. Fifth, univariate analyses of variance were done to test whether subgroups differed significantly with respect to the scale scores. For this purpose, proc GLM (SAS STAT) was used. Sixth, in order to examine the effect of outliers on the results, the analyses were repeated with the reduced dataset (i.e., the dataset without outliers, see Section 2). Confirmatory Mokken scale analyses (item selection method = Test) demonstrated that the nine items constituted a strong Mokken scale with a total-scale scalability coefficient H equal to 0.67 and a reliability coefficient rho equal to 0.93 (Table 5). The lowest item scalability coefficient Hi was equal to 0.57, which is well above the default lowerbound for Hi used in exploratory analyses (i.e., lowerbound Hi = 0.3). The check for item monotonicity on the basis of the default options in MSPwin5.0 (i.e., Minvi = 0.03 and Minsize = 122, which is 10 percent of the sample) did not reveal violations of the assumption of monotonicity. This means that the ISRFs of all items increased across all rest-score groups. Thus, the MH model fitted the data well. The customer satisfaction scale-score distribution is presented in Figure 1. The distribution was significantly skewed to the left (p < 0.001). Furthermore, the histogram demonstrates peaks for the scale-scores 27, 31, and 36. The peak for the scale-score 27 was mainly caused by participants who agreed with all items indicative of customer satisfaction, and disagreed with all items counter-indicative of customer satisfaction (i.e., 66 percent of the participants having scale-score 27 responded agree to the five items indicative of customer satisfaction and disagree to the four items counter-indicative of customer satisfaction). The peak for scale-score 31 was mainly caused by participants who agreed with all items indicative of customer satisfaction, and strongly disagreed with all items counter-indicative of customer satisfaction (i.e., 65 percent of the participants having scale-score 31 responded
163

agree to the five items indicative of customer satisfaction and totally disagree to the four items counter-indicative of customer satisfaction). The peak for scale-score 36 was caused by participants who strongly agreed with all items indicative of customer satisfaction, and strongly disagreed with all items counter-indicative of customer satisfaction, because scalescore 36 could only be achieved by responding totally agree to the five items indicative of customer satisfaction and totally disagree to the four items counter-indicative of customer satisfaction. It may be noted that the distribution of scale scores in the first empirical study did not contain sharp peaks for the scale-scores 27, 31, and 36 (Chapter 6, Figure 3). This result is further discussed in Section 5 of the present chapter. Mokken scale analyses using the grouping variables customer segment (valued Top Customers, Standard Customers, and Development Customers; see Chapter 7), gender (valued female, male, and missing), and age group (valued 18 to 39 years, 40 years to 59 years, and 60 years onwards; see Chapter 7) demonstrated that the nine items also constituted a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 5). The checks for item monotonicity in the subgroups yielded a significant violation of the assumption of monotonicity in the subgroup Top Customers. This violation was due to a decrease of the estimated ISRF for Q4c >= 1 (I have regretted my choice for BANK; Table 1). Because the magnitude of the violation was small (i.e., the proportion of responses Q4c >= 1 decreased from 1.00 in the middle restscore group to 0.97 in the highest rest-score group), we considered it unimportant, and we concluded that the MH model fitted the data in subgroups well enough. Univariate analyses of variance demonstrated that the customer segments and age groups differed significantly with respect to the customer satisfaction scale-scores (Table 6). Furthermore, the histograms (not shown here) demonstrated peaks for the scale-scores 27, 31, and 36 in all subgroups investigated. Thus, the peaks cannot be attributed to particular customer segments, gender groups, or age groups.

164

Table 5: Customer Satisfaction Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Complete Dataset (N = 1227)
Total group T 0.67 0.75 0.66 0.67 0.70 0.68 0.72 0.66 0.57 0.67 0.93 0.93 0.93 0.95 0.93 0.67 0.66 0.71 0.67 0.58 0.54 0.55 0.57 0.57 0.67 0.93 0.64 0.66 0.75 0.67 0.65 0.70 0.71 0.78 0.72 0.71 0.84 0.87 0.57 0.76 0.96 0.68 0.66 0.71 0.68 0.67 0.77 0.70 0.65 0.71 0.66 0.70 0.77 0.72 0.70 0.71 0.69 0.62 0.70 0.95 0.67 0.63 0.70 0.63 0.67 0.78 0.71 0.65 0.67 0.69 0.65 0.66 0.78 0.67 0.65 0.64 0.68 0.68 0.71 0.63 0.56 0.66 0.93 0.74 0.73 0.80 0.76 0.74 0.79 0.76 0.74 0.66 0.68 0.70 0.67 0.67 0.67 0.72 0.66 S D F M U 18-39 40-59 60+ 0.64 0.75 0.66 0.67 0.70 0.66 0.72 0.67 0.55 0.67 0.93 Customer Segment Gender Age group

At BANK I feel at home

I am satisfied with BANK

There are good reasons to leave BANK *

I have mixed feelings about BANK *

BANK meets all my requirements for a bank

Last year I had a pleasant relationship with BANK

BANK has met my expectations

I have regretted my choice for BANK *

Last year I had some problems with BANK *

165

Rho

* = scored reversely

Table 6: Customer Satisfaction Scores in the Complete Dataset (N = 1227)


Gender Female 25.98 7.34 <0.001 25.46 Male Unknown 25.04 0.88 0.42 Age group 18-39 24.69 40-59 25.54 60+ 26.39 4.97 <0.01 25.60 Total

Customer segment

Mean

25.91

25.51

23.24

freq 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 freq

Figure 1: Distribution of customer satisfaction scores in the complete dataset (N = 1227, mean = 25.60, SD = 6.66, and skewness = -0.86) The analyses of the reduced dataset yielded similar results as the analyses of the complete dataset. Confirmatory Mokken scale analyses (item selection method = Test) yielded a strong Mokken scale with a total-scale scalability coefficient H equal to 0.67 and a reliability coefficient rho equal to 0.93 (Table 7). The check for item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize = 122, which is 10 percent of the sample) did not reveal violations of the assumption of monotonicity. Thus, the MH model fitted the data in the reduced dataset well. Mokken scale analyses using the grouping variables customer segment, gender, and age group yielded a strong Mokken scale (i.e., H > 0.5) in each subgroup (Table 7). The checks for item monotonicity on the basis of the default options (i.e., Minvi = 0.03 and Minsize = 122, which is 10 percent of the sample) yielded a significant violation of the assumption of monotonicity for item Q4c (Table 1) in the segment Top Customers, but the magnitude of the violation was small. Therefore, we considered it unimportant, and we concluded that the MH model fitted the data in the subgroups well enough. Figure 2 shows the customer satisfaction scale-score distribution. The distribution was significantly skewed to the left (p < 0.001), and there were peaks for scale-scores 27, 31, and 36 (66 percent of the participants having scale-score 27 responded agree to the five items indicative of customer satisfaction and disagree to the four items counter-indicative of

166

satisfaction, 66 percent of the participants having scale-score 31 responded agree to the five items indicative of customer satisfaction and totally disagree to the four items counterindicative of customer satisfaction, and all participants having scale-score 36 responded totally agree to the five items indicative of customer satisfaction and totally disagree to the four items counter-indicative of customer satisfaction). Similar distributions of scale scores were found the customer segments, gender groups, and age groups. Univariate analyses of variance demonstrated that the customer segments and the age groups differed significantly with respect to the scale scores (Table 8).

freq 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 freq

Figure 2: Distribution of customer satisfaction scores in the reduced dataset (N = 1184, mean = 25.69, SD = 6.61, and skewness = -0.85)

167

Table 7: Customer Satisfaction Scales Total-Scale Scalability Coefficients H, Item Scalability Coefficients Hi, and Reliability Coefficients Rho in the Reduced Dataset (N = 1184)
Total group T 0.68 0.75 0.66 0.67 0.70 0.68 0.72 0.66 0.57 0.67 0.93 0.93 0.93 0.95 0.93 0.67 0.67 0.71 0.67 0.58 0.56 0.55 0.58 0.57 0.67 0.93 0.64 0.67 0.75 0.67 0.65 0.71 0.72 0.78 0.73 0.72 0.85 0.89 0.56 0.76 0.96 0.68 0.67 0.70 0.69 0.68 0.76 0.70 0.66 0.71 0.67 0.70 0.77 0.72 0.70 0.72 0.69 0.62 0.70 0.95 0.67 0.64 0.70 0.64 0.67 0.77 0.71 0.65 0.68 0.69 0.65 0.66 0.78 0.67 0.64 0.63 0.67 0.68 0.71 0.63 0.56 0.65 0.93 0.75 0.74 0.79 0.77 0.75 0.79 0.77 0.74 0.66 0.69 0.70 0.68 0.68 0.67 0.72 0.66 S D F M U 18-39 40-59 60+ 0.66 0.76 0.67 0.68 0.71 0.68 0.74 0.68 0.56 0.68 0.94 Customer segment Gender Age group

At BANK I feel at home

I am satisfied with BANK

There are good reasons to leave BANK *

I have mixed feelings about BANK *

BANK meets all my requirements for a bank

Last year I had a pleasant relationship with BANK

BANK has met my expectations

I have regretted my choice for BANK *

Last year I had some problems with BANK *

168

Rho

* = scored reversely

Table 8: Customer Satisfaction Scores in the Reduced Dataset (N = 1184)


Gender Female 25.99 8.05 <0.001 25.57 Male Unknown 25.38 0.51 0.60 Age group 18-39 24.59 40-59 25.71 60+ 26.46 5.82 <0.01 25.69 Total

Customer segment

Mean

26.05

25.47

23.27

Measures for stylistic responding

Preliminary analyses Measures of general midpoint responding and general extreme responding were constructed on the basis of items with (a) low inter-item correlations, and (b) low correlations with customer satisfaction (see Chapter 7). We constructed these measures on the basis of four constructs (i.e., EPSP, EDE, involvement, and understanding), which were hypothesised to be unrelated to customer satisfaction. This hypothesis was tested by means of CFA (e.g., Bollen, 1989; Oort, 1996). A factor model was specified using the nine items reflecting customer satisfaction, the two items reflecting EPSP, the two items reflecting EDE, the four items reflecting involvement, and the four items reflecting understanding (Figure 3). The fit of the model was evaluated on the basis of the goodness of fit index (GFI), the normed fit index (NFI), the non-normed fit index (NNFI), and the AMIs (Oort, 1996, p. 49; see also Chapter 6, Section 4). Furthermore, the correlations between the factors were inspected. CFA was done using proc calis (SAS STAT). The goodness of fit indices demonstrated that the factor model did not fit the data well (because indices having a value of 0.9 or higher indicate an acceptable fit (Bollen, 1989, pp. 269-281), we required a value of 0.9 or higher for each index). Furthermore, the AMIs (Table 9) demonstrated that two items reflecting understanding (i.e., Q9b: I find it difficult to judge the quality of BANK, and Q9d: I know exactly what I may expect from BANK; Table 1) were significantly biased (i.e., p < 0.001) with respect to customer satisfaction (i.e., participants with a high value on customer satisfaction were more inclined to respond positively to these understanding-items (note that item Q9b was scored reversely; Section 2) than participants with a low value on customer satisfaction, even when understanding is controlled for). Because it was required that the items used for measuring general stylistic responding did not reflect customer satisfaction (Chapter 7), we decided not to use these items for the measurement of stylistic responding. Because the first factor model did not fit the data, a second factor model was tested. The second factor model was specified using the same items for customer satisfaction, EPSP, EDE, and involvement, and the two remaining items reflecting understanding (i.e., Q9a and Q9c; Table 1). The second factor model fitted the data well (Table 10; the second factor model), and none of the AMIs (not shown here) was significant. Furthermore, the absolute correlations between the factors reflecting customer satisfaction, EPSP, EDE, involvement, and understanding (Table 11) were considered sufficiently low for the purpose of the current study. Therefore, we decided to use the items reflecting EPSP, EDE, and involvement, and

169

F1

F2

F3

F4

F5

170

Q3a

...

Q4d

Q6a

Q6d

Q7b

Q7c

Q8b

...

Q8e

Q9a

...

Q9d

E3a

E4d

E6a

E6d

E7b

E7c

E8b

E8e

E9a

E9d

Figure 3: Factor Model with Q3a through Q4d reflecting satisfaction, Q6a and Q6d reflecting EPSP, Q7b and Q7c reflecting EDE, Q8b through Q8e reflecting involvement, and Q9a through Q9d reflecting understanding

Table 9: The Two Largest AMIs in the Complete Dataset and The Reduced Dataset
Complete Dataset (N=1227) Factor 1 Satisfaction AMI 9.38 49.77 Factor 1 Satisfaction AMI 9.44 51.10 <0.001 0.02 <0.01 0.10 p-value AMI p-value ns ns EPSP AMI 0.32 0.63 Factor 2 <0.001 0.02 ns 0.60 ns <0.01 0.12 ns 0.30 ns p-value AMI AMI p-value p-value EPSP EDE AMI 4.19 0.39 Factor 2 Factor 3 Factor 4 Involvement p-value <0.05 ns Factor 4 EDE p-value ns ns Involvement AMI 4.24 0.38 p-value <0.05 ns Factor 5 Understanding AMI Factor 5 Understanding AMI p-value p-value -

I find it difficult to judge the quality of BANK *

I know exactly what I may expect from BANK

Reduced Dataset (N = 1184) Factor 3

171

I find it difficult to judge the quality of BANK *

I know exactly what I may expect from BANK

* = scored reversely

two items reflecting understanding (i.e., Q9a and Q9c; Table 1), for the construction of the measures of general midpoint responding and general extreme responding. Table 10: Goodness of Fit of the Factor Models for Customer Satisfaction, EPSP, EDE, Involvement, and Understanding.
First Factor Model CD (N = 1227) GFI NFI NNFI 0.89 0.87 0.86 RD (N = 1184) 0.89 0.87 0.87 Second Factor Model CD (N =1227) 0.93 0.92 0.92 RD (N = 1184) 0.93 0.93 0.93

CD is complete dataset; RD is reduced dataset.

Table 11: Inter-Factor Correlations in the Second Factor Model (Upper Triangle = Complete Dataset; Lower Triangle = Reduced Dataset)
Satisfaction Satisfaction EPSP EDE Involvement Understanding -0.02 0.08 0.08 -0.12 0.49 0.06 0.12 0.08 0.04 0.28 EPSP -0.02 EDE 0.07 0.48 Involvement 0.07 0.06 0.09 Understanding -0.12 0.11 0.05 0.28

General midpoint responding General midpoint responding was defined as the participants proportional use of the middle response category (i.e, corresponding to score 2), which may vary between zero (if zero responses were in the middle response category) and one (if all responses were in the middle response category). To test the hypothesis that satisfaction scores were not affected by the midpoint response style, a measure of general midpoint responding was constructed. For this purpose, the two items reflecting EPSP, the two items reflecting EDE, the four items reflecting involvement, and the two remaining items reflecting understanding (i.e., Q9a and Q9c; Table 1) were used. Missing values were excluded from the operationalisation, because they do not provide information about general midpoint responding. The scores on the measure of general midpoint responding ranged from zero to one, with a mean equal to 0.29 (Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.59, which is rather low but perhaps high enough for research purposes.

172

Midpoint responding to customer satisfaction items To explore whether general midpoint responding was related to midpoint responding to customer satisfaction items, a measure of midpoint responding to customer satisfaction items was constructed. The measure of midpoint responding to customer satisfaction items was constructed similar to the measure of general midpoint responding. However, for the present measure the nine items reflecting customer satisfaction were used. The scores on the measure of midpoint responding to customer satisfaction items ranged from zero to one, with a mean of 0.17 (Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.80.

General extreme responding General extreme responding was defined as the participants proportional use of the extreme response categories (i.e., corresponding to scores 0 and 4), which may vary between zero (if zero responses were in the extreme response categories) and one (if all responses were in the extreme response categories). To test the hypothesis that customer satisfaction scores were not affected by the extreme response style, a measure of general extreme responding was constructed. For this purpose, the same items were used that were also used for the construction of the measure for general midpoint responding. Missing values were excluded from the operationalisation, because they do not provide information about extreme responding. The scores on the measure of general extreme responding ranged from zero to 0.80, with a mean of 0.10 (Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.68.

Extreme responding to customer satisfaction items To explore whether general extreme responding was related to extreme responding to customer satisfaction items, a measure of extreme responding to customer satisfaction items was constructed. The measure of extreme responding to customer satisfaction items was constructed similar to the measure of general extreme responding. However, for the present measure the nine items reflecting customer satisfaction were used. The scores on the measure of extreme responding to satisfaction items ranged from zero to one, with a mean of 0.26 (Table 12). The reliability (i.e., coefficient alpha) of the scores was valued 0.89.

173

Table 12: Descriptive Statistics of General Midpoint Responding (GMR), Midpoint Responding to Customer Satisfaction Items (MRCSI), General Extreme Responding (GER), and Extreme Responding to Customer Satisfaction Items (ERCSI)
Complete dataset (N = 1227) Min GMR MRCSI GER ERCSI 0 0 0 0 Min GMR MRCSI GER ERCSI * = p < 0.001 0 0 0 0 Max 1 1 0.83 1 Max 1 1 0.80 1 Median 0.30 0.11 0 0.11 Median 0.30 0.11 0 0.11 Mean 0.29 0.17 0.10 0.26 Mean 0.29 0.17 0.10 0.26 SD 0.20 0.23 0.15 0.31 SD 0.20 0.23 0.15 0.31 Skewness 0.72 * 1.47 * 1.86 * 1.12 * Skewness 0.72 * 1.49 * 1.87 * 1.11 *

Reduced dataset (N = 1184)

Test of the hypotheses

The hypotheses 14 and 15 were tested in a similar way. First, the correlation was computed between stylistic responding and customer satisfaction scores. This was done using proc corr (SAS STAT). Second, to detect possible non-monotone relations between stylistic responding and customer satisfaction scores, the stylistic responding scores were plotted against the customer satisfaction scores. This was done using MS Excel. Third, the correlation was computed between stylistic responding and stylistic responding to customer satisfaction items. This was done using proc corr (SAS STAT). Hypothesis 14 Hypothesis 14 was: the satisfaction scores are not affected by the midpoint response style. The correlation between general midpoint responding and customer satisfaction was not significant (Table 13). Furthermore, the plot of the customer satisfaction scores against the general midpoint responding scores (Figure 4; complete dataset) did not demonstrate a distinct non-monotone relation. There was a decrease in the standard deviation of the customer satisfaction scores with increasing general midpoint responding scores, but the magnitude of the decrease was small (Table 14) and we considered it unimportant. However, the product-moment correlation between general midpoint responding and midpoint

174

Table 13: Product-Moment Correlations Between General Midpoint Responding (GMR), Midpoint Responding to Customer Satisfaction Items (MRCSI), and Customer Satisfaction (Satisfaction)
Complete dataset (N = 1227) MRCSI GMR * = p < 0.001 0.14* Satisfaction -0.03 Reduced dataset (N = 1184) MRCSI 0.13* Satisfaction -0.03

Table 14: Standard Deviation (SD) of Customer Satisfaction in GMR-Groups (N = Group Size)
Complete dataset (N = 1227) GMR N SD GMR N SD 0 139 6.9 0 131 6.9 0.1 220 7.3 0.1 213 7.3 0.2 212 6.9 0.2 211 6.9 0.3 235 6.2 0.3 220 6.1 0.4 176 6.6 0.4 171 6.4 0.5 113 6.1 0.5 109 6.1 0.6 68 6.3 0.6 67 6.3 0.7 33 5.5 0.7 33 5.5 0.8 13 4.5 0.8 11 4.1 0.9 11 8.2 0.9 11 8.2 1.0 7 6.2 1.0 7 6.2

Reduced dataset (N= 1184)

40 35 30 25 CS 20 15 10 5 0 -0,2 -5 0 0,2 0,4 0,6 GMR 0,8 1 1,2 N

Figure 4: Plot of general midpoint responding scores versus customer satisfaction scores in the complete dataset (N = 1227). The smallest circle represents one participant and the largest circle represents 35 participants.

175

responding to customer satisfaction items was significant (Table 13). Because customer satisfaction was almost unrelated to the items underlying the measure of general midpoint responding, it is plausible that the correlation was caused by the midpoint response style. This implies that it is plausible that the customer satisfaction scores were affected by the midpoint response style. Thus, hypothesis 14 was not supported.

Hypothesis 15 Hypothesis 15 was: the satisfaction scores are not affected by the extreme response style. The correlation between general extreme responding and customer satisfaction was significant in the reduced dataset (Table 15). Furthermore, the plot of the customer satisfaction scores against the general extreme responding scores (Figure 5; complete dataset) showed heteroscedasticity, which means that the variance of customer satisfaction scores differed across subgroups with different general extreme responding scores. The distribution of customer satisfaction scores in subgroups having high general extreme responding scores appears bimodal. This means that high general extreme responding scores corresponded with very high or very low customer satisfaction scores. In agreement with this results, the standard deviation of customer satisfaction scores increased as the general extreme responding score increased (Table 16). The product-moment correlation between general extreme responding and extreme responding to customer satisfaction items was also significant (Table 15). Because customer satisfaction was almost unrelated to the items underlying the measure of general extreme responding, it is plausible that the correlation was caused by the extreme response style. Thus, hypothesis 15 was not supported.

176

Table 15: Product-Moment Correlations Between General Extreme Responding (GER), Extreme Responding to Customer Satisfaction Items (ERCSI), and Customer Satisfaction (Satisfaction)
Complete dataset (N = 1227) ERCSI GER 0.37** * = p < 0.05; ** = p<0.001 Satisfaction 0.04 Reduced dataset (N = 1184) ERCSI 0.38** Satisfaction 0.07*

Table 16: Standard Deviation (SD) of Customer Satisfaction in GER-Groups (N = Group Size)
Complete dataset (N = 1227) GER N SD GER N SD 0 701 5.5 0 678 5.5 0.1 206 6.8 0.1 204 6.8 0.2 134 7.1 0.2 131 7.1 0.3 93 8.5 0.3 86 8.4 0.4 47 9.0 0.4 44 9.1 0.5 24 10.2 0.5 21 9.8 0.6 10 14.8 0.6 9 13.1 0.7 6 12.9 0.7 6 12.9 0.8 6 12.5 0.8 5 13.9 0.9 0 0.9 0 1.0 0 1.0 0 -

Reduced dataset (N= 1184)

45 40 35 30 CS 25 20 15 10 5 0 -0,2 -5 0 0,2 0,4 GER 0,6 0,8 1 N

Figure 5: Plot of general extreme responding scores versus customer satisfaction scores in the complete dataset (N = 1227). The smallest circle represents one participant, and the largest circle represents 120 participants.

177

Discussion

The second empirical study confirmed that the measurement instrument of customer satisfaction constituted a scale according to the MH-model. Moreover, the results confirmed that the scale also could be used in different subgroups. This result contributes to the validity of the scale-score interpretations in terms of customer satisfaction with the company. The tests of the hypotheses demonstrated that stylistic responding influenced the customer satisfaction scale-scores. This means, for example, that the extreme scale-scores were partly due to a high preference for extreme response categories in general. Because the contamination of the scale scores due to stylistic responding was small (Tables 13 and 15), its importance for the assessment of construct validity of the scale scores is also small. Still, it limits the construct validity of the scale scores. The distribution of scale scores showed remarkable peaks for the scale-scores 27, 31, and 36. Each peak was mainly caused by a group of participants who responded to all nine items in a similar way (see Section 3). For example, the peak for the scale-score 36 was caused by participants who agreed strongly with all items indicative of customer satisfaction and disagreed strongly with all items counter-indicative of customer satisfaction. Therefore, we suspect that the peaks were caused by stylistic responding. Because the measurement instrument for customer satisfaction, the location of customer satisfaction items in the questionnaire, the composition of the sample, and the mode of administration were largely similar in the first and the second study, it is possible that stylistic responding also influenced the scale scores in the first empirical study. However, the distribution of scale scores in the first empirical study did not show such sharp peaks as the distribution of scale scores in the second empirical study. Therefore we suspect that stylistic responding was less prevalent in the first empirical study than in the second empirical study. The following difference between the methods used in the first empirical study and the second empirical study may explain the differences between the distributions of the scale scores found in these studies. In the first empirical study the questionnaire was accompanied by an extensive E-mail in which persons were invited to participate in the survey and in which the purpose of the study was explained, whereas in the second empirical study the questionnaire was accompanied by a succinct E-mail in which persons were invited to participate in the survey but which did not explain the purpose of the study. The explanation of the purpose of the study in the former E-mail may have affected the motivation of participants to complete the questionnaire conscientiously. Therefore, we suspect that satisficing (e.g., Krosnick, 1999, pp. 546-548) was less prevalent in the first empirical study
178

than in the second empirical study, and that for that reason stylistic responding also was less prevalent in the first empirical study than in the second empirical study. Summarising, the fit of the MH model supports the interpretation of the scale scores from the second empirical study in terms of customer satisfaction with the company. Because the content of the measurement instrument also supported that interpretation (Chapter 4), there is much evidence for construct validity. Still, the tests of the hypotheses indicated that stylistic responding contaminated the scale scores, and this limits the construct validity of the scale scores. The contamination of the scale scores may be taken into account in any followup research using the scale scores for customer satisfaction from the second empirical study. It cannot be ruled out that the scale scores were also contaminated by stylistic responding in the first empirical study, but there is evidence that contamination of the scale scores by stylisitc responding in the first empirical study was smaller than in the second empirical study. Nevertheless, we suggest taking the possibility that the scale scores were contaminated by stylistic responding into account in any follow-up research using the scale scores for customer satisfaction from the first empirical study.

Conclusions

The content of the measurement instrument for customer satisfaction and the results from the measurement analyses of the empirical studies supported the validity of the scale-score interpretation in terms of overall satisfaction with the company. Moreover, the results of the analyses demonstrated that the scale may be used in different customer populations.

The items that were indicative of customer satisfaction and the other items that were counter-indicative of the construct together constituted a unidimensional scale. This result supports the conception of dissatisfaction as the opposite of satisfaction on a bipolar continuum.

The quality of the measurement instrument may be improved by the substitution of the items Q3a (At BANK I feel at home; Table 1) and Q4d (Last year I had some problems with BANK; Table 1) with other items. This means that it should be investigated whether the substitution of these items with two other items that reflect customer satisfaction with a retail bank improves the validity of the measurements of customer satisfaction with a retail bank.

179

The results of the second empirical study indicate that the scale scores partly reflected stylistic responding. It is plausible that a part of the extreme satisfaction scores was caused by a high general preference for extreme response categories. It is possible that stylistic responding also influenced the scale scores in the first empirical study but probably to a lesser extent than in the second empirical study.

There is strong evidence for the interpretation of the scale scores in terms of satisfaction with the company in the first empirical study, and fair evidence for such an interpretation in the second empirical study. Thus, the application of a measurement instrument in one study may yield better scale scores than the application of the instrument in another study (see also Messick, 1989, p. 81). This illustrates that construct validity is a property of score interpretations and not of measurement instruments, and that construct validity is always a matter of degree (see also Messick, 1989, p.13).

180

181

182

Chapter 9 General discussion

The meaning of customer satisfaction

The purpose of this thesis was to unravel the meaning of customer satisfaction in the context of retail banking. Customer satisfaction is a psychological construct. Psychological constructs are organisational principles with respect to behaviour. This means that they are schemes through which we perceive and interpret behaviours of persons. The ontological status of customer satisfaction as organisational principle constitutes an important component of the meaning of customer satisfaction. The meaning of satisfaction is context-specific (Giese & Cote, 2000). Moreover, satisfaction with a retail bank may be the absence of dissatisfaction for one customer, a judgement of the performance of the bank for another customer, and an affect for a third customer. To account for the different manifestations of satisfaction, we defined customer satisfaction with a retail bank as the valenced response of the customer, directed towards the retail bank, and evoked by the customers experiences with the bank throughout time. This definition expresses that customer satisfaction with a retail bank encompasses affects and cognitions that can be placed on a dimension that ranges from negative to positive. Because the negative response expresses dissatisfaction and the positive response expresses satisfaction, the definition also covers customer dissatisfaction with a retail bank. This definition constitutes an important component of the theoretical meaning of customer satisfaction in the context of retail banking. Marketing studies (e.g., Anderson et al., 1994; Hennig-Thurau et al., 2002, Oliver, 1997, Verhoef, 2001, Yi, 1990) suggest that customer satisfaction is related to various other psychological constructs, such as trust, quality, customer loyalty, commitment, word-ofmouth, and image, and to customer profitability (CP). There is evidence that customer satisfaction is preceded by quality and trust, and that customer satisfaction precedes customer loyalty and CP. We hypothesised that the latter relations also applied to customer satisfaction in the context of retail banking. The hypothesised relations between customer satisfaction and trust, quality, customer loyalty, and CP constitute the implicit definition of customer

183

satisfaction in the context of retail banking. This definition also constitutes an important component of the theoretical meaning of the construct. The empirical meaning of customer satisfaction is the behaviours that are associated with customer satisfaction. In the context of retail banking, these are manifestations of performance evaluations, disconfirmation, expectations, emotions, and regret (also, Oliver, 1997, pp. 316-318, 343-344). These manifestations can be used for the measurement of customer satisfaction. Because customer satisfaction has a large behavioural domain, we developed a nine-item measurement instrument for customer satisfaction with a bank, which covered different manifestations of customer satisfaction. Five items were indicative of customer satisfaction and four items were counter-indicative of customer satisfaction. The first empirical study into customer satisfaction with BANK demonstrated that the nine items constituted a unidimensional scale. This result supported the theoretical notion that customer satisfaction is the opposite of customer dissatisfaction on a bipolar dimension. We found positive correlations between customer satisfaction and quality, and between customer satisfaction and customer loyalty. These results supported our hypotheses concerning these correlations, but three remarks are in order. First, the measurement of quality on the basis of items reflecting judgements about products and services provided by the company resulted in missing data problems and halo effects. We did not find a satisfactory solution for these problems. Eventually, we re-defined quality as absence of problems, and we measured quality by means of the total score on the recoded items regarding the experience of problems with BANK in the preceding twelve months. We found that absence of problems with BANK in the preceding twelve months was positively correlated with customer satisfaction with BANK. Second, we found that the customer satisfaction scale-scores were contaminated by quality. The scale scores were corrected by excluding one item from the customer satisfaction scale when testing for the correlation between customer satisfaction and quality. Third, we found that the customer satisfaction scale-scores were contaminated by customer loyalty. The scale scores were corrected by excluding one item from the customer satisfaction scale when testing for the correlation between customer satisfaction and customer loyalty. The positive effects of customer satisfaction on future CP after one year and future CP after two years supported the hypothesis that customer satisfaction influences CP, and confirmed the importance of customer satisfaction in the context of retail banking. We found that current CP (i.e., CP at the time of the measurement of customer satisfaction) is an indispensable variable in analyses of the relation between customer satisfaction and future CP.
184

However, we also found that the size of the effect of current CP on future CP decreased as the time-lag between current CP and future CP increased. This implies that companies cannot rely on current CP as a guarantee for future CP, and this warrants taking more than only current CP into account when estimating customer lifetime value. Furthermore, we found that CP follows a Pareto-like distribution in the context of retail banking, and that CP had to be transformed before analysing the relation between customer satisfaction and CP. The latter results may be useful for the development of methods for investigating the influence of customer satisfaction on CP and estimating customer lifetime value. We also found a positive correlation between customer satisfaction and trust, which supported our hypothesis concerning this correlation. It may be noted that the correlation between the customer satisfaction scores and the trust scores was as large as the correlation between the customer satisfaction scores and the ACSI scores. Customers were satisfied with BANK when they trusted BANK, and dissatisfied with BANK when they did not trust BANK. This was also an outcome of the pre-tests. There seems to be a large overlap between the construct of customer satisfaction and the construct of trust in the context of retail banking. Further research into the generalisability of this result is needed. The second empirical study demonstrated that the customer satisfaction scores were contaminated by stylistic responding of the participants. This means, for example, that the extreme scale-scores were partly due to a high general preference for extreme response categories. Because the contamination of the scale scores due to stylistic responding was small, we considered its importance for the construct validity of the scale scores also small. Still, it limits the construct validity of the scale scores. Therefore we suggested taking the contamination of the scale scores into account when using these scores for any follow-up research. In all, the empirical studies yielded scale scores for customer satisfaction with BANK and provided much evidence for the construct validity of the scale scores. Therefore we concluded that the scale scores were rightly interpreted as customer satisfaction with BANK. The scale scores constitute a special case of the empirical meaning of customer satisfaction.

The measurement of psychological constructs in marketing research

Another purpose of this thesis was to select a suitable methodology for the construction of a measurement instrument for customer satisfaction and the validation of the customer satisfaction scale-scores. Psychological constructs can be measured by means of
185

psychological tests (including measurement instruments for typical behaviour; see Chapter 1, Section 4). For the measurement of psychological constructs in marketing research a test often consists of a set of items that is administered in a survey. On the basis of a participants responses to these items, his or her position on the scale for the property is inferred. It is broadly acknowledged that validity of measurement is a key success factor for satisfaction research and for marketing research in general. However, the practice of construct validation in marketing research does not comply with theory of validity as formulated by Messick (1989). We demonstrated (Chapter 3, Section 6) that construct validity was insufficiently investigated in important satisfaction studies in the marketing literature (see also Giese & Cote, 2000; Peterson & Wilson, 1992). This hampers the usefulness of satisfaction research for scientific purposes, such as testing of satisfaction theories, and for business purposes, such as marketing strategy development. Construct validity is the appropriateness of test-score interpretations in terms of the construct of interest (e.g., Cronbach; 1971; Messick, 1989, pp. 13, 34). Churchills (1979) perspective on construct validity, which is the leading perspective in marketing measurement, conflicts with this conception of construct validity. Churchills (1979) perspective is flawed with respect to the conception of construct validity as a property of a test, the criteria for the assessment of construct validity, and the procedures for validation research. Construct validity is a property of test-score interpretations, and not of tests. This means, for example, that the application of a test may yield valid measurements of a construct in one instance, and less valid measurements of a construct in another instance (see also Chapter 8, Section 7). Furthermore, Churchills (1979) criteria for the assessment of construct validity, which are nomological validity, divergent validity, and convergent validity, do not address the two major threats to construct validity, which are construct underrepresentation and constructirrelevant variance (Messick, 1989, 1995). Consequently, Churchills (1979) procedures for validation research, which are MTMM framework and correlating a measure with a criterion variable, do not suffice for the assessment of construct validity. Moreover, because the methods applied in MTMM research are often similar, the agreement between two measures of the same often trait provides evidence for reliability rather than validity (also, see Anastasi, 1988, p. 158). The flaws in Churchills (1979) perspective on construct validity justify adopting of Messicks (1989, 1995) perspective on construct validity and construct validation research. Because the deductive design (Schouwstra, 2000) is in agreement with Messicks (1989, 1995) perspective on construct validity and validation research, we applied the deductive
186

design for the development of a test for customer satisfaction with BANK and the construct validation of the test scores. The deductive design addresses test development and construct validation for typical-behaviour properties (Table 1): Table 1: Outline of Construct Validation Within the Deductive Design (Schouwstra, 2000, p. 60)
Scientific arguments Rationales a. Formulation b. Translation c. Modelling Empirical evidence Of what construct of interest is Of construct of interest into test content How test score reflects construct That test score reflects whole of construct And what not And nothing else And nothing else And nothing else Construct representation Irrelevant variance

Psychometric theory provides useful guidelines for the definition of the construct of interest, the translation of the construct of interest into test content, and the choice of a measurement model for modelling the participants responses to the test. For example, it is well-known that single items often yield inadequate measurements of constructs (e.g., Messick, 1989, pp. 14, 35), and this may explain why customer satisfaction has to be measured by means of a multiple-item scale. The empirical research is directed at the collection of empirical evidence regarding construct representation and irrelevant variance. Schouwstra (2000, pp. 69-71) suggested formulating and testing hypotheses regarding construct representation and absence of irrelevant variance. Two remarks concerning the empirical research are in order. First, it is not feasible to formulate and test all possible hypotheses regarding construct representation and absence of irrelevant variance. Therefore, the formulation and testing of hypotheses has to be restricted to the most important hypotheses, and which are the most important hypotheses remains to some extent arbitrary. Second, we consider the requirement that the test scores reflect the whole construct and noting else too rigid. It is not feasible to exclude all possible irrelevant variance in the practice of psychological measurement. Therefore, construct validity is always a matter of degree (see also Messick, 1989, p. 13). The conclusion that contamination of test scores cannot be avoided in the practice of psychological measurement limits the construct validity of test scores. Therefore, in future research we suggest to investigate the degree to which test scores are contaminated by other

187

attributes, and to take any contamination into account when using the test scores for follow-up research. In all, the application of the deductive design yielded a scale for customer satisfaction with BANK and much evidence for the construct validity of the scale scores. Therefore, we consider the deductive design a useful framework for measurement instrument development and construct validation in marketing research.

Suggestions for future research

First, we suggest further research into the influence of customer satisfaction on CP in retail banking. We recommend research into the generalisability of the results of the present study to other groups and companies within the financial services industry. Furthermore, we recommend future research into the definition and measurement of CP, such as the inclusion or exclusion of various costs, and the accumulation of profits over longer time periods than one year. The second suggestion for future research concerns executing context-specific customer satisfaction studies. We subscribe to Giese and Cote (2000) that the meaning of customer satisfaction is context-specific, and that definitions and measures of customer satisfaction also should be context-specific. We also expect that the antecedents of customer satisfaction are context-specific. Context-specific customer satisfaction studies may contribute to the further development of general theory about customer satisfaction. The third suggestion for future research concerns the development of context-specific definitions of quality and corresponding measurement procedures. We had much difficulty with the measurement of quality in the present study. Moreover, different inquiries may require different definitions and operationalisations of quality. Proper operationalisations of quality are important for investigating the influence of quality on customer satisfaction, and such investigations are important for making customer satisfaction actionable for companies. Fourth, we suggest the deductive design (Schouwstra, 2000) for the measurement of psychological constructs in marketing research. The marketing literature uses many psychological constructs, and there appears to be much redundancy in the collection of constructs. Marketing research may disentangle these constructs, and for that purpose it has to define and measure them properly. Because Messicks (1989, 1995) perspective on construct validity can be put into action by the deductive design, we suggest the deductive design for

188

the measurement and the validation of measurements of psychological constructs in marketing research.

Concluding remarks

This thesis explored the meaning of customer satisfaction in retail banking, and the usefulness of psychometric methods for test development and construct validation. It was demonstrated that, in the context of retail banking, customer satisfaction is manifested in performance evaluations, disconfirmation, expectations, emotions, and regret. This is a useful result for the further development of satisfaction theory and for customer satisfaction management in the financial services industry. It explains why customer satisfaction is not exclusively driven by technical quality of products, services, and processes. Therefore a banks customer satisfaction management strategy may start with managing technical quality, and having accomplished that, it may proceed with managing functional quality, complaints handling, and corporate communication. Furthermore, the thesis provided strong evidence for the influence of customer satisfaction on CP. This is a useful result for the further development of satisfaction theory and eventually for marketing strategy development in the industry of retail banking. Customer satisfaction influencing CP warrants the appointment of customer satisfaction as a strategic goal of retail banks, the more because the influence of current CP on future CP decreases when the time lag increases. The thesis also demonstrated that the application of psychometric methods for the measurement of customer satisfaction yielded scale scores that can be rightly interpreted as customer satisfaction scores. This is a useful result for the methodology of marketing research and eventually for the development and validation of marketing theories.

189

190

References

Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37, 1-15. Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan Publishing Company. Angoff, W.H. (1988). Validity: An evolving concept. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 19-32). Hillsdale, NJ: Lawrence Erlbaum Associates. Anderson, E.W., Fornell, C., & Lehmann, D.R. (1994). Customer satisfaction, market share and profitability: Findings from Sweden. Journal of Marketing, 58, 53-66. Anderson, E.W., & Mittal, V. (2000). Strengthening the satisfaction-profit chain. Journal of Service Research, 3, 107-123. Anderson, E.W., Fornell, C., & Mazvancheryl, S.K. (2004). Customer satisfaction and shareholder value. Journal of Marketing, 68, 172-185. Baumgartner, H., & Steenkamp, J.B.E.M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38, 143-156. Baumgartner, H., & Steenkamp, J.B.E.M. (2006). Response biases in marketing research. In R. Grover & M. Vriens (Eds.), The handbook of marketing research: Uses, misuses and future advances (pp. 95-109). Thousand Oaks: Sage Publications. Bearden, W.O., Netemeyer, R.G., & Mobley, M.F. (1993). Handbook of marketing scales: Multi-item measures for marketing and consumer behavior research. Newbury Park, CA: Sage Publications. Belson, W.A. (1981). The design and understanding of survey questions. Aldershot: Gower Publishing Company Limited. Belson, W.A. (1986). Validity in survey research. Aldershot: Gower Publishing Company Limited. Berens, G.A.J.M. (2004). Corporate branding: The development of corporate associations and their influence on stakeholder reactions. Doctoral dissertation, Erasmus University, Rotterdam. Bernaards, C.A., & Sijtsma, K. (2000). Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable. Multivariate Behavioral Research, 34, 277-313.
191

Bloemer, J.M.M. (1993). Loyaliteit en tevredenheid: Een studie naar de relatie tussen merktrouw en consumententevredenheid. Doctoral dissertation, University of Maastricht, Maastricht. Bloemer, J.M.M., & Poiesz, T.B.C. (1989). The illusion of consumer satisfaction. Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 2, 43-48. Bloemer, J.M.M., & Kasper, H.D.P. (1995). The complex relationship between consumer satisfaction and brand loyalty. Journal of Economic Psychology, 16, 311-329. Bollen, K.A. (1989). Structural equations with latent variables. New York: Wiley. Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203-219. Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061-1071. Borsboom, D. (2005). Measuring the mind. New York: Cambridge University Press. Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425-440. Bouwmeester, S., & Sijtsma, K. (2006) Constructing a transitive reasoning test for 6-to-13 year old children. European Journal of Psychological Assessment, 22, 225-232. Bradburn, N.M. (1983). Response effects. In P.H. Rossi, J.D. Wright, & A.B. Anderson (Eds.), Handbook of survey research (pp. 289-328). New York: Academic Press Inc.. Bronner, F., & Kuijlen, T. (2007). The live or digital interviewer: A comparison between CASI, CAPI, and CATI with respect to differences in response behaviour. International Journal of Market Research, 49, 167-190. Buttle, F. (1996). SERVQUAL: Review, critique, research agenda. European Journal of Marketing, 30, 8-32. Byrne, B.M. (1989). A primer of LISREL: Basic applications and programming for confirmatory factor analytic models. New York: Springer-Verlag. Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105. Campbell, D., & Frei, F. (2004). The persistence of customer profitability: Empirical evidence and implications from a financial services firm. Journal of Service Research, 7, 107-123. Carnap, R. (1950). Logical foundations of probability. Chicago: University of Chicago Press. Carnap, R. (1956). The methodological character of theoretical concepts. In H. Feigl & M. Scriven (Eds.), Minnesota studies in the philosophy of science, Vol I. Minneapolis: University of Minnesota Press.

192

Caruana, A. (2002). Service loyalty: The effects of service quality and the mediating role of customer satisfaction. European Journal of Marketing, 36, 811-828. Churchill, G.A. (1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16, 64-73. Churchill, G.A., & Suprenant, C. (1982). An investigation into the determinants of customer satisfaction. Journal of Marketing Research, 19, 491-504. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale: Lawrence Erlbaum Associates. Cook, T.D., & Campbell, D.T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally. Coombs, C.H. (1964). A theory of data. New York: John Wiley and Sons. Cooper, R., & Kaplan, R.S. (1991). The design of cost management systems: Text, cases, and readings. Englewood Cliffs, NJ: Prentice Hall. Coulthard, L.J.M. (2004). Measuring service quality: A review and critique of research using SERVQUAL. The Market Research Society, 46, 479-497. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-335. Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281-302. Cronbach, L.J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational measurement (pp. 443-507). Washington, DC: American Council on Education. Cronbach, L.J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum Associates. Cronbach, L.J. (1989). Construct validation after thirty years. In R. Linn (Ed.), Intelligence: Measurement, theory, and public policy (pp. 147-171). Urbana, IL: University of Illinois Press. Cronin, J.J., & Taylor, S.A. (1992). Measuring service quality: A reexamination and extension. Journal of Marketing, 56, 55-68. Cronin, J.J., & Taylor, S.A. (1994). SERVPERF versus SERVQUAL: Reconciling perfomance-based and perceptions minus expectations measurement of service quality. Journal of Marketing, 58, 125-131. De Ruyter, K., Bloemer. J., & Peeters, P. (1997). Merging service quality and service satisfaction: An empirical test of an integrative model. Journal of Economic Psychology, 18, 387-406.
193

Dick, A., & Basu, K. (1994). Customer loyalty: Toward an integrated conceptual framework. Journal of Marketing Science, 22, 99-113. Dillman, D.A., Tortora, R.S., & Bowker, D. (1998). Principles for constructing web surveys. SESRC Technical Report 98-50. Washington State Universtity. Dillman, D.A., & Bowker, D.K. (2001). The web questionnaire challenge to survey methodologists. In U.D. Reips & M. Bosnjak (Eds.), Dimensions of internet science (pp. 159-178). Lengerich: Pabst Science Publishers. Donkers, B., Verhoef, P.C., & De Jong, M.G. (2007). Modeling CLV: A test of competing models in the insurance industry. Quantitative Marketing and Economics, 5, 163-190. Embretson, S.E., & Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Fabrigar, L.R., Krosnick, J.A., & MacDougall, B.L. (2005). Attitude measurement: Techniques for measuring the unobservable. In T.C. Brock & M.C. Green (Eds.), Persuasion: Psychological insights and perspectives (pp. 17-40). Thousand Oaks, CA: Sage. Fornell, C., & Larcker, D.F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 28, 39-50. Fornell, C., & Wernerfelt, B. (1987). Defensive marketing strategy by customer complaint management: A theoretical analysis. Journal of Marketing Research, 24, 337-346. Fornell, C., & Wernerfelt, B. (1988). A model for customer complaint management. Marketing Science, 7, 271-286. Fornell, C. (1992). A national customer satisfaction barometer: The Swedish experience. Journal of Marketing, 56, 6-21. Fornell, C., Johnson, M.D., Anderson, E.W., Cha, J., & Bryant, B.E. (1996). The American customer satisfaction index: Nature, purpose and findings. Journal of Marketing, 60, 718. Frege, G. (1892). On sence and reference. In P. Geach & M. Black (Eds.), (1952). Translations of the philosophical writings of Gottlob Frege. Oxford England: Blackwell. Friman, M. (2004). The structure of affective reactions to critical incidents. Journal of Economic Psychology, 25, 331-353. Gardner, H. (1999). Intelligence reframed: Multiple intelligences for the 21st century. New York: Basic Books. Garvin, D.A. (1983). Quality on the line. Harvard Business Review, 61, 65-73.

194

Giese, J.L., & Cote, J.A. (2000). Defining customer satisfaction. Academy of Marketing Science Review. www.amsreview.org/articles/giese01-2000.pdf. Goedee, J., Reijnders, W., & Van Thiel, D. (2008). Bankieren in 2020: De impact van consumentenvertrouwen en technologische ontwikkelingen. Amsterdam: Pearson Education Benelux. Gorsuch, R.L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Greenleaf, E.A. (1992a). Improving rating scale measures by detecting and correcting bias components in some response styles. Journal of Marketing Research, 29, 176-188. Greenleaf, E.A. (1992b). Measuring extreme response style. Public Opinion Quarterly, 56, 176-188. Gremler, D.D., & Brown, S.W. (1996). Service loyalty, its nature, importance and implications. In B. Edvardsson, S.W. Brown, R. Johnston, & E.E. Scheuing (Eds.), Advancing service quality: A global perspective (pp. 171-180). International Service Quality Association. Gremler, D.D., & Brown, S.W. (1999). The loyalty ripple effect: Appreciating the full value of customers. International Journal of Service Industry Management, 10, 271-299. Grnroos, C. (1984). A service quality model and its marketing implications. European Journal of Marketing, 18, 36-44. Grnroos, C. (1990). Service management and marketing: Managing the moments of truth in service competition. Lexington, MA: Lexington Books. Groves, R.M. (1989). Survey errors and survey costs. New York: Wiley. Gruca, T.S., & Rego, L.L. (2005). Customer satisfaction, cash flow and shareholder value. Journal of Marketing, 69, 115-130. Gustafsson, A, Johnsons, M.D., & Roos, I. (2005). The effects of customer satisfaction, relationship commitment dimensions, and triggers on customer retention. Journal of Marketing, 69, 210-218. Guttman, L. (1954). An outline of some new methodology for social research. Public Opinion Quarterly, 18, 395-404. Hausknecht, D.R. (1990). Measurement scales in consumer satisfaction/dissatisfaction. Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 3, 1-11. Hays, W.L. (1988). Statistics (4th ed.). New York: Holt, Rinehart and Winston, Inc.. Heiser, W.J. (2006). Measurement without copper instruments and experiment without complete control. Psychometrika, 71, 457-461.

195

Hennig-Thurau, T., Gwinner, K.P., & Gremler, D.D. (2002). Understanding relationship marketing outcomes: An integration of relational benefits and relationship quality. Journal of Service Research, 4, 230-247. Herzberg, F., Mausner, B., & Snyderman, B.B. (1959). The motivaton to work. New York: Wiley. Howard, J.A., & Sheth, J.N. (1969). The theory of buyer behavior. New York: John Wiley and Sons. Homburg, C., Koschate, N., & Hoyer, W.D. (2005). Do satisfied customers really pay more? A study of the relationship between customer satisfaction and willingness to pay. Journal of Marketing, 69, 84-96. Hox, J.J. (1997). From theoretical concept to survey question. In L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Eds.), Survey measurement and process quality (pp. 47-70). New York: Wiley. Hox, J.J. (1998). Er is nieuws onder de zon: Nieuwe oplossingen voor oude problemen. Kwantitatieve Methoden, 19, 95-118. Ittner, C.D., & Larcker, D.F. (1998). Are nonfinancial measures leading indicators of financial performance? An analysis of customer satisfaction. Journal of Accounting Research, 36, 1-35. Jack, A., B. (1967). Sampling from a Pareto distribution. Metroeconomica, 19, 216-223. Jackson, D.N., & Messick, S. (1958). Content and style in personality assessment. Psychological Bulletin, 55, 243-252. Jackson, D.N. (1971). The dynamics of structured personality tests: 1971. Psychological Review, 78, 229-248. Jackson, D.N. (1973). Structural personality assessment. In B.B. Wolman (Ed.), Handbook of general psychology (pp. 775-792). NJ: Prentice Hall. Jacoby, J. (1976). Consumer research: Telling it like it is. In B.B. Anderson (Ed.), Advances in Consumer Research, 3, 1-11. Jansen, B.R.J., & Van der Maas, H. (1997). Statistical tests of the rule assessment methodology by latent class analysis. Developmental Review, 17, 321-357. Johnson, M.D., Gustafsson, A., Andreassen, T.W., Lervik, L., & Cha, J. (2001). The evolution and future of national customer satisfaction index models. Journal of Economic Psychology, 22, 217-245. Johnston, R. (1995). The determinants of service quality: Satisfiers and dissatisfiers. International Journal of Service Industry Management, 6, 53-71.
196

Kackar, R.N. (1989). Taguchis quality philosophy: Analysis and commentary. In K. Dehnad (Ed.), Quality control, robust design, and the Taguchi method (pp. 3-19). Pacific Grove: Wadsworth and Brooks/Cole. Kane, M. (2006). In praise of pluralism. A comment on Borsboom. Psychometrika, 71, 441445. Kelley, T.L. (1927). Interpretation of educational measurements. New York: World Book Company. Knowles, E.S., & Nathan, K.T. (1997). Acquiescent responding in self reports: Cognitive style or social concern. Journal of Research in Personality, 31, 293-301. Krosnick, J.A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5, 213-236. Krosnick, J.A., & Fabrigar, L.R. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg, P. Biemer, M. Collins, E. De Leeuw, C. Dippo, N. Schwarz, & D. Trewin (Eds.), Survey measurement and process quality (pp. 141-164). New York: Wiley. Krosnick, J.A. (1999). Survey research. Annual Review of Psychology, 50, 537-567. Lehmann, D.R. (1999). Consumer behaviour and Y2K. Journal of Marketing, 63, 14-18. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 44-53. Little, R.J.A., & Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. Loevinger, J. (1948). The technique of homogeneous tests compared with some aspects of scale analysis and factor analysis. Psychological Bulletin, 45, 507-530. Lord, F.M., & Novick, M.R., (1968). Statistical theories of mental test scores. Reading: Addison Wesley. Luo, X., & Homburg, C. (2007). Neglected outcomes of customer satisfaction. Journal of Marketing, 71, 133-149. Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 12, 49-55. Mano, H., & Oliver, R.L. (1993). Assessing the dimensionality and structure of the consumption experience: Evaluation, feeling, and satisfaction. Journal of Consumer Research, 20, 451-466. Maxwell, S.E., & Delaney, H.D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont, CA: Wadsworth Publishing Company.
197

Medlin, C.J., & Quester, P.G. (2002). Inter-firm trust: Two theoretical dimensions versus a global measure. Paper presented at the IMP conference in Perth, Australia. www.impgroup.org/uploads/papers/4247.pdf. Mellenbergh, G.J. (1985). Vraagonzuiverheid: Detectie, definitie en onderzoek. Nederlands Tijdschrift voor de Psychologie, 40, 425-435. Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3345). Hillsdale, NJ: Lawrence Erlbaum Associates. Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 13103). New York: Macmillan Publishing Co. Messick, S. (1991). Psychology and methodology of response styles. In R.E. Snow & D.E. Wiley (Eds.), Improving inquiry in social science: A volume in honor of Lee J. Cronbach (pp. 161-200). Hillsdale, NJ: Lawrence Erlbaum Associates. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749. Mittal, V., & Kamakura, W.A. (2001). Satisfaction, repurchase intent, and repurchase behavior: Investigating the moderating effects of customer characteristics. Journal of Marketing Research, 38, 131-142. Molenaar, I.W. (1995). Some background for item response theory and the Rasch model. In I.W. Molenaar & G.H. Fischer (Eds.), Rasch models: Foundations, recent developments and applications (pp. 3-14). New York: Springer-Verlag. Molenaar, I.W., & Sijtsma, K. (2000). MSP5 for windows: Users manual. Groningen: ProGAMMA. Mokken, R.J. (1971). A theory and procedure of scale analysis. The Hague: Mouton; Berlin: De Gruyter. Morgan, R.M., & Hunt, S.D. (1994). The commitment-trust theory of relationship marketing. Journal of Marketing, 58, 20-38. Mulhern, F.J. (1999). Customer profitability analysis: Measurement, concentration, and research directions. Journal of Interactive Marketing, 13, 25-40. Murphy, K.R., & Davidshofer, C.O. (1991). Psychological testing: Principles and applications (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

198

Newman, K. (2001). Interrogating SERVQUAL: A critical assessment of service quality measurement in a high street retail bank. International Journal of Bank Marketing, 19, 126-139. Niraj, R., Gupta, M., & Narasimhan, C. (2001). Customer profitability in a supply chain. Journal of Marketing, 65, 1-16. Nunnally, J.C. (1978). Psychometric theory (2nd ed.). New York: McGraw Hill. Oliver, R.L. (1980). A cognitive model of the antecedents and consequences of satisfaction decisions. Journal of Marketing Research, 17, 460-469. Oliver, R.L., & DeSarbo, W.S. (1988). Response determinants in satisfaction judgments. Journal of Consumer Research, 14, 495-507. Oliver, R.L., & Swan, J.E. (1989). Consumer perceptions of interpersonal equity and satisfaction in transactions: A field survey approach. Journal of Marketing, 53, 21-35. Oliver, R.L. (1993). Cognitive, affective, and attribute bases of the satisfaction response. Journal of Consumer Research, 20, 418-430. Oliver, R.L. (1997). Satisfaction: A behavioral perspective on the consumer. New York: McGraw Hill. Oliver, R.L. (1999). Whence consumer loyalty? Journal of Marketing, 63, 33-44. Oliver, R.L., & Burke, R.R. (1999). Expectation processes in satisfaction formation. Journal of Service Research, 1, 196-214. Oort, F.J. (1996). Using restricted factor analysis in test construction. Doctoral dissertation, University of Amsterdam, Amsterdam. Oosterveld, P. (1996). Questionnaire design methods. Doctoral dissertation, University of Amsterdam, Amsterdam. Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1985). A conceptual model of service quality and its implications for future research. Journal of Marketing, 49, 41-50. Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1988). SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. Journal of Retailing, 64, 12-40. Parasuraman, A., Zeithaml, V.A., & Berry, L. L. (1994). Reassessment of expectations as a comparison standard in measuring service quality: Implications for future research. Journal of Marketing, 58, 111-124. Paulhus, D.L. (1991). Measurement and control of response bias. In J.P. Robinson, P.R. Shaver, & L.S. Wrightsman (Eds.), Measures of personality and social psychological attitudes (pp. 17-59). San Diego, CA: Academic Press Inc..

199

Peter, J.P. (1981). Construct validity: A review of basic issues and marketing practices. Journal of Marketing Research, 18, 133-145. Peterson, R.A., & Wilson, W.R. (1992). Measuring customer satisfaction: Fact and artefact. Journal of the Academy of Marketing Science, 20, 61-71. Pfeifer, P.E., Haskins, M.E., & Conroy, R.M. (2005). Customer lifetime value, customer profitability, and the treatment of acquisition spending. Journal of Managerial Issues, 17, 11-25. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Reichheld, F.F., & Sasser, W.E. (1990). Zero defections: Quality comes to service. Harvard Business Review, 68, 105-111. Reichheld, F.F. (2006). The ultimate question: Driving good profits and true growth. Cambridge: Harvard Business School Press. Rotter, J. (1967). A new scale for the measurement of interpersonal trust. Journal of Personality, 35, 651-665. Rugg, D. (1941). Experiments in wording questions: II. Public Opinion Quarterly, 5, 91-92. Russell, J.A., & Carroll, J.M. (1999a). On the bipolarity of positive and negative affect. Psychological Bulletin, 125, 3-30. Russell, J.A., & Carroll, J.M. (1999b). The phoenix of bipolarity: Reply to Watson and Tellegen (1999). Psychological Bulletin, 125, 611-617. Rust, R.T., & Zahorik, A.J. (1993). Customer satisfaction, customer retention and market share. Journal of Retailing, 69, 193-21. Saris, W.E., Van Wijk, T., & Scherpenzeel, A. (1998). Validity and reliability of subjective social indicators: The effect of different measures of association. Social Indicators Research, 45, 173-199. Sartori, G. (1984). Guidelines for concept analysis. In G. Sartori (Ed.), Social science concepts: A systematic analysis (pp. 15-85). Beverly Hills, CA: Sage. Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. Scherpenzeel, A.C. (1995). A question of quality: Evaluating survey questions by multitraitmultimethod studies. Doctoral dissertation, University of Amsterdam, Amsterdam. Schouwstra, S.J. (2000). On testing plausible threats to construct validity. Doctoral dissertation, University of Amsterdam, Amsterdam.

200

Schuman, H. & Presser, S. (1981). Questions and answers in attitude surveys: Experiments on question form, wording, and context. New York: Academic Press Inc.. Sheatsley, P.B. (1983). Questionnaire construction and item writing. In P.H. Rossi, J.D. Wright, & A.B. Anderson (Eds.), Handbook of survey research (pp. 195-230). New York: Academic Press Inc.. Sijtsma, K., & Molenaar, I.W. (2002). Introduction to nonparametric item response theory. Thousand Oaks: Sage. Sijtsma, K., & Van der Ark, L.A. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research, 38, 503-528. Sijtsma, K. (2006). Psychometrics in psychological research: Role model or partner in science? Psychometrika, 71, 451-455. Sijtsma, K., Emons, W.H.M., Bouwmeester, S., Nyklicek, I., & Roorda, L.D. (2008). Nonparametric IRT analysis of quality-of-life scales and its application to the world health organization quality-of-life scale (WHOQOL-Bref). Quality of Life Research, 17, 275-290. Singh, J., & Sirdeshmukh, D. (2000). Agency and trust mechanisms in consumer satisfaction and loyalty judgments. Journal of the Academy of Marketing Science, 28, 150-167. Soliman, H.M. (1970). Motivation-hygiene theory of job attitudes: An empirical investigation and an attempt to reconcile both the one- and the two-factor theories of job attitudes. Journal of Applied Psychology, 54, 452-461. Stouthard, M.E.A., Mellenbergh, G.J., & Hoogstraten, J. (1993). Assessment of dental anxiety: A facet approach. Anxiety, Stress, and Coping, 6, 89-105. Sudman, S., & Bradburn, N.M. (1982). Asking questions: A practical guide to questionnaire design. San Francisco: Jossey-Bass. Tabachnick, B.G., & Fidell, L.S. (2007). Using multivariate statistics (5th edition). Boston: Pearson Education Inc.. Terpstra, M.J. & Van Gastel, W. (2004). Inventory of customer satisfaction surveys. Unpublished report, ING Group, Amsterdam. Terpstra, M.J. (2005). Customer satisfaction, customer loyalty and customer profitability. Unpublished report, ING Group, Amsterdam. Terpstra, M.J. (2006a). Customer satisfaction, customer loyalty, and recommendation intentions. Unpublished report, ING Group, Amsterdam. Terpstra, M.J. (2006b). Business facts for ING Retail Netherlands. Unpublished report, ING Group, Amsterdam.
201

Terpstra, M.J. (2008). A model for developing customer satisfaction business cases. Unpublished report, ING Group, Amsterdam. Thomson, G. (1961). The inspiration of science. London: Oxford University Press. Thorndike, E.L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25-29. Torgerson, W.S. (1958). Theory and methods of scaling. New York: John Wiley and Sons. Tse, D.K., & Wilton, P.C. (1988). Models of consumer satisfaction: An extension. Journal of Marketing Research, 25, 204-212. Van der Ark, L.A. (2005). Stochastic ordering of the latent trait by the sum score under various polytomous IRT models. Psychometrika, 70, 283-304. Van Dolen, W., Lemmink, J., Mattsson, J., & Rhoen, I. (2001). Affective consumer responses in service encounters: The emotional content in narratives of critical incidents. Journal of Economic Psychology, 22, 359-376. Van Herk, H. (2000). Equivalence in a cross-national context: Methodological & empirical issues in marketing research. Doctoral dissertation, University of Tilburg, Tilburg. Van Montfort, K., Masurel, E., & Van Rijn, I. (2000). Service satisfaction: An empirical analysis of consumer satisfaction in financial services. The Service Industries Journal, 20, 80-94. Van Ginkel, J. R. (2007). Multiple imputation for incomplete test, questionnaire, and survey data. Doctoral dissertation, University of Tilburg, Tilburg. Van Ginkel, J.R., Van der Ark, L.A., & Sijtsma, K. (2007). Multiple imputation of item scores in test and questionnaire data, and influence on psychometric results. Multivariate Behavioral Research, 42, 387-414. Verhoef, P.C. (2001). Analysing customer relationships: Linking relational constructs and marketing instruments to customer behavior. Doctoral dissertation, Erasmus University, Rotterdam. Westbrook, R.A., & Oliver, R.L. (1981). Developing better measures of consumer satisfaction: Some preliminary results. In K.B. Monroe (Ed.), Advances in consumer research (8th ed.) (pp. 94-99). MI: Association for Consumer Research. Westbrook, R.A., & Oliver, R.L. (1991). The dimensionality of consumption emotion patterns and consumer satisfaction. Journal of Consumer Research, 18, 84-91. Wirtz, J., & Bateson, J.E.G. (1995). An experimental investigation of halo effects in satisfaction measures of service attributes. International Journal of Service Industry Management, 6, 84-102.
202

Wirtz, J. (2000). An examination of the presence, magnitude and impact of halo on consumer satisfaction measures. Journal of Retailing and Consumer Services, 7, 89-99. Wirtz, J., & Lee, M.C. (2003). An examination of the quality and context-specific applicability of commonly customer satisfaction measures. Journal of Service Research, 5, 345-355. Wittgenstein, L. (1953). Philosophische untersuchungen/Philosophical investigations. In M. Derksen (2002). Filosofische onderzoekingen. Amsterdam: Boom. Wittgenstein, L. (1958). The blue and brown books. In W. Oranje (1996). Het blauwe en het bruine boek. Amsterdam: Boom. Wolf, M.G. (1970). Need gratification theory: A theoretical reformulation of job satisfaction/dissatisfaction and job motivation. Journal of Applied Psychology, 54, 8794. Woodall, T. (2001). Six sigma and service quality: Christian Grnroos revisited. Journal of Marketing Management, 17, 595-607. Yi, Y. (1990). A critical review of consumer satisfaction. In V.A. Zeithaml (Ed.), Review of marketing (pp. 68-123). Chicago: American Marketing Association. Zeithaml, V.A., & Bitner, M.J. (1996). Services marketing. New York: McGraw Hill. Zeithaml, V.A., Parasuraman, A., Berry, L.L. (1990). Delivering quality service. New York: The Free Press.

203

204

Samenvatting

Dit proefschrift gaat over de meting van tevredenheid van klanten in de sector van de financile dienstverlening door banken. Klanttevredenheid is een onderwerp van maatschappelijk en economisch belang. Dit komt ook tot uiting in de omvangrijke academische literatuur over dit onderwerp. Het blijkt dat tevredenheid zich moeilijk laat definiren en meten (Oliver, 1997, blz. 13). Dit rechtvaardigt nader onderzoek naar de betekenis en de meting van tevredenheid. Psychologische eigenschappen zoals tevredenheid zijn theoretische constructen, en worden afgeleid uit het gedrag van personen. In marketingonderzoek worden psychologische eigenschappen veelal gemeten door middel van vragenlijsten. Vaak gebruikt men in het marketingonderzoek voor de meting van dit soort eigenschappen slechts een enkele vraag, maar uit de psychometrie is bekend dat een enkele vraag de eigenschap onvolledig dekt (Messick, 1989, blz. 14). Verder hanteren verschillende marketingstudies verschillende definities en operationaliseringen van bepaalde eigenschappen. Deze factoren hinderen de interpretatie en vergelijkbaarheid van resultaten van verschillende studies. Hoofdstuk 1 behandelt de belangrijkste problemen in klanttevredenheidsonderzoek. Dit zijn het ontbreken van een goed uitgewerkte definitie van klanttevredenheid, de gebrekkige validiteit van metingen van klanttevredenheid, en het gebrek aan kennis over de invloed van klanttevredenheid op klantrendement. Deze problemen hangen onderling samen, omdat het ontbreken van een goed uitgewerkte definitie van tevredenheid het meten van tevredenheid hindert, en omdat het ontbreken van valide metingen van tevredenheid de analyse van de invloed van tevredenheid op klantrendement hindert. Dit proefschrift beoogt bij te dragen aan de oplossing van deze problemen, en in het verlengde daarvan aan de wetenschappelijke theorie over klanttevredenheid en de methodologie van klanttevredenheidsonderzoek. De eerste studie in dit proefschrift gaat over theoretische kenmerken van psychologische eigenschappen en meetprocedures voor psychologische eigenschappen. Psychologische eigenschappen zijn theoretische constructen. Psychologische constructen zoals tevredenheid hebben een bepaalde lingustische en empirische betekenis. De lingustische betekenis van tevredenheid is het gebruik van de term tevredenheid in de alledaagse en wetenschappelijke taal, en kan worden beschreven in een definitie van tevredenheid. De empirische betekenis van tevredenheid betreft de gedragingen die worden geassocieerd met tevredenheid, en vormt

205

de basis voor metingen van tevredenheid. De meetprocedures voor psychologische eigenschappen zijn procedures voor het gebruik van psychologische meetinstrumenten, zoals psychologische testen en psychologische vragenlijsten, de constructie van schalen voor de meting van eigenschappen, en het scoren van personen op de schalen. Hoofdstuk 2 behandelt de definitie van psychologische eigenschappen, de ontwikkeling van meetinstrumenten voor psychologische eigenschappen, het proces van het meten van psychologische eigenschappen, de constructie van schalen, en de kwaliteit van meetwaarden. Het hoofdstuk besluit met een discussie over verschillende opvattingen van constructvaliditeit. In navolging van Messick (1989, 1995) vatten we constructvaliditeit op als de passing van interpretaties van schaalscores in termen van het te meten construct. Deze opvatting van constructvaliditeit vormde de aanleiding het deductive design te kiezen voor de validatie van de metingen van tevredenheid. De tweede studie betrof het gebruik van de eigenschappen tevredenheid en ontevredenheid in de literatuur. Hoofdstuk 3 geeft een overzicht van de belangrijkste definities en theorien van deze eigenschappen in de marketing literatuur. Vastgesteld werd dat tevredenheid en ontevredenheid worden gebruikt om bepaalde gevoelens en oordelen van consumenten te beschrijven. Deze gevoelens en oordelen vormen een respons op ervaringen van de klant met bijvoorbeeld een product, en verder heeft de respons betrekking op dit product, en drukt hij een evaluatie van het product uit. Tevredenheid/ontevredenheid met een bank werd gedefinieerd als de evaluatieve respons van de klant, die is gericht op de bank, en die wordt veroorzaakt door het geheel van ervaringen van de klant met de bank. De positieve respons drukt tevredenheid uit, en de negatieve respons drukt ontevredenheid uit. Tot slot werd vastgesteld dat de bestaande vragenlijsten voor klanttevredenheid nauwelijks geschikt zijn voor het meten van tevredenheid met een bank. In hoofdstuk 4 wordt het deductive design (Schouwstra, 2000) voor de ontwikkeling van psychologische vragenlijsten behandeld. Het deductive design werd gebruikt voor de ontwikkeling van een psychologische vragenlijst voor klanttevredenheid over BANK, de formulering van richtlijnen voor de afname van de vragenlijst, de specificatie van het meetmodel voor de constructie van schalen, en de formulering van hypotheses over eigenschappen van de schaalscores. De vragenlijst bestond uit negen gesloten vragen over aspecten van tevredenheid/ontevredenheid over BANK. Het model van monotone homogeniteit (Mokken, 1971) werd gebruikt om de schaalbaarheid van deze items te onderzoeken. De hypotheses hadden betrekking op de eigenschappen van de schaalscores,

206

zoals de zuiverheid ervan en de relatie met metingen van andere constructen. De passing van het meetmodel alsmede de hypotheses werden onderzocht in twee empirische studies. De derde studie was een empirisch onderzoek naar klanttevredenheid over BANK. Dit was het eerste empirische onderzoek. De doelen van het onderzoek waren de constructie van een schaal voor klanttevredenheid, het beoordelen van de passing van de interpretatie van schaalscores als meetwaarden voor klanttevredenheid over BANK, en het onderzoeken van de invloed van klanttevredenheid op het klantrendement. Hoofdstuk 5 beschrijft de methode van het onderzoek. De vragenlijst voor klanttevredenheid werd afgenomen in een steekproef van 3600 klanten van BANK, hetgeen 1689 respondenten opleverde. Ook werden in datzelfde onderzoek de eigenschappen vertrouwen, kwaliteit, en loyaliteit gemeten. Het databestand werd verrijkt met gegevens over het klantrendement op het tijdstip van het onderzoek, na verloop van n jaar, en na verloop van twee jaar. De resultaten van het eerste empirische onderzoek worden gerapporteerd in hoofdstuk 6. Volgens het model van monotone homogeniteit wordt klanttevredenheid gemeten op een eendimensionele schaal. Daarmee werd een opvatting uit de literatuur weerlegd die zegt dat tevredenheid en ontevredenheid twee aparte dimensies representeren. De toetsen van de hypotheses over de kenmerken van de schaalscores bevestigden de interpretatie van de schaalscores als meetwaarden voor klanttevredenheid over BANK. Uit de toets van de hypothese over de relatie tussen kwaliteit en klanttevredenheid bleek een sterke relatie tussen de afwezigheid van problemen met BANK en tevredenheid over BANK. Dit resultaat bevestigt het belang van proceskwaliteit voor klanttevredenheid. Tot slot werden positieve effecten van klanttevredenheid op het klantrendement na verloop van respectievelijk n jaar en twee jaar gevonden. Dit resultaat geeft aan dat tevredenheid van invloed is op klantrendement. De vierde studie was een empirisch onderzoek naar klanttevredenheid met BANK. Dit was het tweede empirische onderzoek. Het doel van dit onderzoek was vast te stellen of de schaalscores voor klanttevredenheid werden benvloed door responsstijlen, zoals een algemene voorkeur voor de middelste antwoordcategorie van items of de extreme antwoordcategorien. Voor dit onderzoek werd de vragenlijst voor klanttevredenheid afgenomen in een steekproef van bijna 3000 klanten van BANK, hetgeen 1227 respondenten opleverde. Om de responsstijlen te meten werden ook gegevens verzameld over bijvoorbeeld de verwachtingen van de klant over de ontwikkeling van de Nederlandse economie. Hoofdstuk 7 beschrijft de methode van het onderzoek.

207

De resultaten van het tweede empirische onderzoek worden gerapporteerd in hoofdstuk 8. Uit de resultaten bleek dat de schaalscores voor klanttevredenheid enigzins vertekend werden door responsstijlen. Derhalve kan niet worden uitgesloten dat responsstijlen ook de schaalscores voor klanttevredenheid in het eerste empirische studie in lichte mate hebben vertekend. Daarom wordt geadviseerd om bij gebruik van de vragenlijst in vervolgonderzoek maatregelen te nemen ter correctie van de invloed van deze responsstijlen. Hoofdstuk 9 betreft de algemene discussie. Geconcludeerd werd dat tevredenheid met een bank zich manifesteert in emoties, spijt, verwachtingen, disconfirmatie, en rationele oordelen. Dit is een nuttig resultaat voor wetenschappelijke theorievorming over klanttevredenheid en voor klanttevredenheidsmanagement in de financile dienstverlening. Het verklaart bijvoorbeeld waarom klanttevredenheid niet uitsluitend wordt gedreven door technische kwaliteit van wat een bedrijf levert, maar ook door functionele kwaliteit, dus hoe een bedrijf zijn diensten levert, de communicatie met de klant, en reputatie van het bedrijf. Verder levert het onderzoek ondersteuning voor de theorie over de invloed van klanttevredenheid op klantbaten. Dit is een nuttig resultaat voor wetenschappelijke theorievorming en voor strategie ontwikkeling in de financile dienstverlening. Het gebruik van moderne psychometrische methoden heeft bijgedragen aan ontwikkeling van een meetinstrument voor klanttevredenheid met banken en de vaststelling van de validiteit van de metingen van klanttevredenheid. Dit is een nuttig resultaat voor de methodologie van wetenschappelijk en toegepast klanttevredenheidsonderzoek.

208

209

210

Appendix 1 Vragenlijst onderzoek 1

Vraag 0 Beschouwt u BANK als uw belangrijkste bank? Ja. Nee.. Vraag 1. Welke financile producten heeft u op dit moment bij BANK? Er zijn meerdere antwoorden mogelijk. Ja a b c d e f g h i j Betaalrekening... Betaalpas Credit card. Internetbankieren... Spaarproducten.. Beleggingsproducten. Hypotheek.. Kredieten, leningen (voor consumptief gebruik)... Schadeverzekeringen. Levensverzekeringen.

Vraag 2. Via welk kanaal of kanalen heeft u in het afgelopen jaar contact met BANK gehad? Er zijn meerdere antwoorden mogelijk. Ja a b c d e f g h i j k Medewerker kantoor... Adviseur aan huis. Telefoon1.... Telefoon2... Correspondentie.. E-mail. Internet.... Internetbankieren1.... Internetbankieren2... Anders, namelijk .. Geen.

211

Vraag 3. (stellingen roteren) In dit blok staat een aantal stellingen over BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D E F G H I Ik voel me thuis bij BANK Ik ben tevreden over BANK Nvt Er zijn goede redenen om weg te gaan bij BANK IIk heb gemengde gevoelens over BANK Nvt BANK voldoet aan alle eisen die ik aan een bank stel Nvt Nvt | | | | | | | | |

Vraag 4. (stellingen roteren) Als u eens terugdenkt aan het afgelopen jaar, en in het bijzonder aan uw ervaringen met BANK, Wilt u nu van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D Ik had afgelopen jaar een prettige relatie met BANK BANK heeft aan mijn verwachtingen voldaan Ik heb spijt gehad van mijn keuze voor BANK Ik had afgelopen jaar problemen met BANK | | | |

Vraag 5. (stellingen roteren) Er volgt nu een aantal stellingen over het vertrouwen in de dienstverlening van BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 antwoordcategorie geen antwoord> A B C D E F G Ik kan er op rekenen dat BANK mij eerlijk behandelt Ik kan er op rekenen dat BANK mijn zaken correct afhandelt Ik kan er op vertrouwen dat BANK beloftes en afspraken nakomt Ik twijfel soms aan de kwaliteiten van BANK Ik twijfel soms aan de goede wil van BANK Ik kan BANK vertrouwen Bij BANK kan ik rekenen op een goede service | | | | | | |

212

Vraag 6. Er volgt nu een aantal stellingen over problemen met BANK. Kunt u aangeven of u een dergelijk probleem heeft gehad, in het afgelopen jaar? Er zijn meerdere antwoorden mogelijk. Ja A B C D E F G H I J K L M N O P Q Fouten in de afhandeling van uw bankzaken Fouten in de verwerking van uw opdrachten Onvoldoende informatie over uw bankzaken Onduidelijke informatie over uw bankzaken Onredelijke kosten voor het gebruik van diensten Trage dienstverlening Trage overboekingen Slecht nakomen van afspraken door BANK Onvoldoende bereikbaarheid via de telefoon Onvoldoende bereikbaarheid via internet Onvoldoende bereikbaarheid van kantoren Slecht beantwoorden van uw vragen Problemen met passen Problemen met pinnen Problemen met internetbankieren Een ander probleem Geen probleem

Vraag 7. (stellingen roteren) Er volgt nu een aantal aspecten van de dienstverlening van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van uitstekend tot en met slecht, en 1 antwoordcategorie weet niet> A B C D E F De juiste verwerking van opdrachten die u geeft De snelheid waarmee overboekingen worden verricht De snelheid van de dienstverlening door BANK. Het nakomen van afspraken en beloftes door BANK Het correct afhandelen van uw bankzaken De frequentie waarmee u rekeningafschriften ontvangt van BANK | | | | | |

Vraag 8. (stellingen roteren) Er volgt nu een aantal aspecten van producten en diensten van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van uitstekend tot en met slecht, en 1 antwoordcategorie weet niet> A B C D E F De tarieven van betaalpakketten van BANK De gemak van de producten en diensten van BANK De duidelijkheid van de informatie die BANK u verstrekt over uw bankzaken De toereikendheid van informatie die BANK u verstrekt over uw bankzaken De kosten die BANK rekent voor het gebruik van diensten De rentes van producten van BANK | | | | | |

213

Vraag 9. (stellingen roteren) Er volgt nu een aantal aspecten van de dienstverlening via de verschillende kanalen van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van uitstekend tot en met slecht, en 1 antwoordcategorie weet niet> A B C D E F De dienstverlening via telefoon De dienstverlening via internet De dienstverlening via het kantoor De dienstverlening via post/correspondentie Het gemak waarmee u BANK kunt bereiken De voorzieningen voor internetbankieren | | | | | |

Vraag 10. (stellingen roteren) Er volgt nu een aantal aspecten van contacten met BANK en medewerkers van BANK. Kunt u op grond van uw persoonlijke ervaringen de prestatie van BANK op deze aspecten beoordelen? <4 antwoordcategorieen van uitstekend tot en met slecht, en 1 antwoordcategorie weet niet> A B C D E F De vriendelijkheid van medewerkers van BANK De deskundigheid van medewerkers van BANK De betrouwbaarheid van medewerkers van BANK De mate waarin BANK luistert naar uw wensen en vragen De manier waarop BANK u te woord staat De manier waarop BANK klachten behandelt | | | | | |

Vraag 11. Met welke banken heeft u verder een relatie? Kunt u per bank aangeven of u hier bankzaken heeft lopen? Met bankzaken doelen wij op alle soorten van bankzaken, zoals betalen, sparen, beleggen, lenen, hypotheken, verzekeren, internetbankieren et cetera. Ja a b c d e f g h Bank1 Bank2 Bank3 Bank4 Bank5 Bank6 Andere bank, namelijk.. Geen andere bank

Vraag 12. (alleen BANK en de banken uit vraag 11) Hoe belangrijk is elk van de volgende banken voor u? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B BANK Bank | |

214

Vraag 13. (alleen BANK en de banken uit vraag 11) Hoe tevreden bent u over de volgende banken? <10 antwoordcategorieen, van bijzonder ontevreden tot en met bijzonder tevreden, en 1 categorie geen antwoord> A B BANK Bank | |

Vraag 14. (stellingen roteren) Er volgt nu een aantal stellingen over uw houding ten opzichte van BANK, in vergelijking tot andere banken. Kunt u van elke stelling aangeven in hoeverre u het met de stelling eens dan wel oneens ben? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 antwoordcategorie geen antwoord> A B C D E F Indien ik nieuwe bankproducten nodig heb, is BANK mijn eerste keuze Ik heb meer sympathie voor BANK dan voor andere banken Voor sommige dingen kan ik het beste terecht bij een andere bank Ik overweeg om over te stappen van BANK naar een andere bank BANK biedt mij voordelen die andere banken niet bieden BANK is al jarenlang mijn belangrijkste bank | | | | | |

Vraag 15. Nvt Vraag 16. Nvt Vraag 17. Hoeveel interesse heeft u voor bankzaken? <5 antwoordcategorieen, van veel interesse tot en met geen interesse, en 1 categorie geen antwoord> <antwoord> |

Vraag 18. Hoeveel interesse heeft u voor nieuwe financiele producten en diensten die banken afnemen? <5 antwoordcategorieen, van veel interesse tot en met geen interesse, en 1 categorie geen antwoord> <antwoord> Vraag 19. Nvt |

215

Vraag 20. ( vraag 20b t/m 20e roteren) In dit blok staat een aantal vragen over BANK. Wilt u elk van deze vragen beantwoorden?

Vraag 20a Nvt Vraag 20b Hoe tevreden bent u over BANK? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent bijzonder ontevreden en 10 betekent bijzonder tevreden. <10 antwoordcategorieen, van bijzonder ontevreden tot en met bijzonder tevreden, en 1 categorie geen antwoord> cijfer |

Vraag 20c Hoe goed voldoet BANK aan uw ideaalbeeld van een bank? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent verre van ideaal en 10 betekent ideaal. <10 antwoordcategorieen, van verre van ideaal tot en met ideaal, en 1 categorie geen antwoord> cijfer |

Vraag 20d Hoe goed heeft BANK, in het afgelopen jaar, aan uw verwachtingen voldaan? Wilt u uw oordeel uitdrukken in een cijfer, waarbij 1 betekent zeer slecht en 10 betekent uitstekend. <10 antwoordcategorieen, van zeer slecht tot en met uitstekend, en 1 categorie geen antwoord> cijfer Vraag 20e Nvt Vraag 20f Nvt |

216

Appendix 2 E-mail bij onderzoek 1

Geachte <aanhef>, Graag nodigen we u uit om aan een vragenlijst van het BANK-Klantenpanel mee te doen. Dit onderzoek gaat over uw tevredenheid over BANK. Over diverse aspecten wordt uw waardering gevraagd. Misschien herkent u enkele vragen die we vorig jaar ook al eens gesteld hebben. Deze vragen herhalen we om beter inzicht te krijgen in hoe klanten over BANK denken in verhouding tot vorig jaar. Met uw deelname helpt u BANK dus om haar dienstverlening beter te laten aansluiten op uw wensen. Sommige vragen in dit onderzoek lijken sterk op elkaar. Wij vragen hiervoor uw begrip. Dit is een bewuste keuze aangezien deze vragenlijst ook een wetenschappelijk doel heeft. BANK wil achterhalen hoe zij klanttevredenheid het beste kan onderzoeken. BANK doet dit in samenwerking met de Universiteit van Tilburg. Als u de vragen beantwoordt zoals u gewend bent, helpt u tevens mee aan de ontwikkeling van ons marktonderzoek.
Hoe werkt dit onderzoek? Als u onderstaande link aanklikt komt u vanzelf in de vragenlijst. Het invullen duurt ongeveer 20 minuten. U kunt tot en met 12 oktober aanstaande meedoen aan dit onderzoek.Voor deelneming aan dit onderzoek ontvangt u 10 punten (waarde 1,=). Deze 10 punten worden binnen 72 uur na het invullen van de vragenlijst aan uw saldo toegevoegd.

Na minimaal twee onderzoeken kunt u met uw punten een leuke attentie bestellen of uw punten schenken aan een goed doel: Artsen Zonder Grenzen, Natuurmonumenten of SOS Kinderdorpen. Met uw persoonlijke nummer (UserID) en unieke code (wachtwoord) kunt u inloggen op uw persoonlijke pagina van www.BANK-klantenpanel.nl. Klik op de onderstaande link om de vragenlijst te starten. Hartelijk dank voor uw medewerking aan het BANK-Klantenpanel. Met vriendelijke groet, helpdesk BANK-Klantenpanel www.BANK-klantenpanel.nl

217

Appendix 3 Vragenlijst onderzoek 2

Vraag 1

Welke financile producten heeft u op dit moment bij BANK? Er zijn meerdere antwoorden mogelijk. Ja a b c d e f g h i j Betaalrekening... Betaalpas Credit card. Internetbankieren... Spaarproducten.. Beleggingsproducten. Hypotheek.. Kredieten, leningen (voor consumptief gebruik)... Schadeverzekeringen. Levensverzekeringen.

Vraag 2 Via welk kanaal of kanalen heeft u in het afgelopen jaar contact met BANK gehad? Er zijn meerdere antwoorden mogelijk. Ja a b c d e f g h i j Medewerker kantoor.... Adviseur aan huis. Telefoon1.... Telefoon2... Correspondentie.. E-mail. Internet.... Internetbankieren.... Anders .. .. Geen.

Vraag 3 (stellingen roteren) In dit blok staat een aantal stellingen over BANK. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B D E G Ik voel me thuis bij BANK Ik ben tevreden over BANK Er zijn goede redenen om weg te gaan bij BANK Ik heb gemengde gevoelens over BANK BANK voldoet aan alle eisen die ik aan een bank stel | | | | |

218

Vraag 4 (stellingen roteren) Als u eens terugdenkt aan het afgelopen jaar, en in het bijzonder aan uw ervaringen met BANK, Wilt u nu van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D Ik had afgelopen jaar een prettige relatie met BANK BANK heeft aan mijn verwachtingen voldaan Ik heb spijt gehad van mijn keuze voor BANK Ik had afgelopen jaar problemen met BANK | | | |

Vraag 5 Nvt Vraag 6. (stellingen roteren) Er volgt nu een aantal stellingen over uw verwachtingen ten aanzien van de ontwikkeling van uw koopkracht. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D Ik verwacht dat mijn koopkracht het komend jaar gaat verbeteren Ik verwacht dat mijn koopkracht het komend jaar gaat verslechteren Ik verwacht dat mijn koopkracht over 5 jaar beter is dan nu Ik verwacht dat mijn koopkracht over 5 jaar slechter is dan nu | | | |

Vraag 7. (stellingen roteren) Er volgt nu een aantal stellingen over uw verwachtingen ten aanzien van de economische ontwikkeling van Nederland. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D Ik verwacht dat de economie van Nederland het komend jaar gaat verbeteren Ik verwacht dat de economie van Nederland het komend jaar gaat verslechteren Ik verwacht dat de economie van Nederland over 5 jaar beter is dan nu Ik verwacht dat de economie van Nederland over 5 jaar slechter is dan nu | | | |

219

Vraag 8 (stellingen roteren) In dit blok staan zes stellingen over uw houding ten opzichte van bankzaken, zoals betalen, sparen, lenen, hypotheken, verzekeren, beleggen, et cetera. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D E F Ik maak me nooit druk over bankzaken Ik vind bankzaken erg belangrijk Het goed regelen van bankzaken maakt het leven gemakkelijker Ik vind bankzaken vervelend Bankzaken laten mij koud Het goed regelen van bankzaken kan veel geld opleveren | | | | | |

Vraag 9. (stellingen roteren) In dit blok staan vier stellingen over de transparantie van de financile markt. Wilt u van iedere stelling aangeven in hoeverre u het met de stelling eens dan wel oneens bent? <5 antwoordcategorieen, van zeer mee eens tot en met zeer mee oneens, en 1 categorie geen antwoord> A B C D Ik ken de voor- en nadelen van de banken in de Nederlandse markt Ik kan de kwaliteit van BANK moeilijk beoordelen Ik kan de kwaliteit van verschillende banken moeilijk vergelijken Ik weet precies wat ik van BANK kan verwachten | | | |

Vraag 10 Nvt

220

Potrebbero piacerti anche