Another concern in the context of this research is the applicability of the new scale to different forms of eWOM sources. Hence, the following research question was proposed:

RQ 8: (a) Can the developed measurement of trust in eWOM reliably and validly measure

trust in specific eWOM platforms? (b) Does trust in eWOM vary across different platforms? (c) On which platform is eWOM most trusted?

Empirical data for answering this question was gained with a paper-and-pencil questionnaire which comprised an adapted version of the eWOM trust scale. First, respondents were furnished with descriptions and examples of the two major types of eWOM sources, namely consumer- and marketer-developed sites (CDS vs. MDS). Then the questionnaire contained two sections targeting the measurement of trust in these alternative sources of eWOM by using an adopted version of the new eWOM trust scale. For instance, the item “The information given in online customer reviews is credible” was transformed to “The information given in consumer- developed eWOM sites is credible" and “The information given in marketer-developed sites is credible” respectively, in order to fit the research context. As usual, these questions were measured on a 7-Point Likert scale ranging from 0 (“I strongly disagree”) to 6 (“I strongly agree”). Two different versions of the questionnaire with different ordering of these two sections were used in order to control for ordering effects. In addition, nine items were included adopted from literature (Choi & Rifon, 2002; Flanagin & Metzger, 2000; Johnson & Kaye, 2009) in order to measure the credibility of the two platforms on a 7-Point Likert scale, and 11 items measuring perceived reviewer credibility on a 7-Point semantic differential adopted from Dou et al. (2012) originating from Ohanian (1990). The questionnaire also contained eWOM/Internet usage patterns and demographic variables. Every fifth passerby at five heavily frequented places in Vienna was interviewed after verifying that the prospective respondent was an Austrian Internet user, had some experience with online customer reviews/was familiar with eWOM on the platforms, and whether he/she exhibited the required demographic variables in order to fit the quotas (gender, age group) on Internet users based on figures from Statistik Austria (2012). In total, interviews with 176 consumers were conducted; 50.6 % were females and the average was 31.2 years (range: 16 to 58 years).

The reliability and validity of the scale was assessed independently for the two types of review sources. In both cases, the 22-item, second-order confirmatory factor model was estimated using LISREL 8.5 and resulted in a significant chi-square test for both scales (%^{2}Cons = 406.80, dfCons = 204, pCons < .001; %^{2}Mar = 504.43, dfMar = 204, pMar < .001), but acceptable measurement fit (RMSEA = .06 (.07); NFI = .90 (.88); NNFI = .95 (.91); CFI = .95 (.91); SRMR = .05 (.07) - results for the MDS in parentheses). Table 45 reports the measures to assess reliability, as well as validity, of the measure. On the item level, all factor loadings were significant, as all of them (except Be2 for marketer sites) exceeded the .70 threshold suggested by Nunally (1978). Additionally, the corrected item-to-total correlations were acceptable and also the individual items’ squared multiple correlation with their respective factors were .50 or better and, therefore, surpass the recommended cut-off value of .50 proposed by Fornell and Larcker (1981).

Concerning the reliability of the five first-order constructs, construct reliability was above the minimum threshold of .60 recommended by Fornell and Larcker (1981) and, in both types of sites, it exceeded the the preferred level of .70 (Churchill, 1979). Specifically, it ranged from .87 to .96 for consumer sites, and from .74 to .95 for marketer sites. Other estimates of internal consistency were also estimated for the first-order constructs. The Cronbach alphas were considerable (ability: .94 (.93), integrity/honesty: .92 (.93), benevolence: .87 (.74), willingness to rely: .96 (.95), and willingness to depend: .90 (.88)) and were all above Nunally’s (1978) recommended level (.70) and in most cases also above the stricter .80 cut-off (Netemeyer et al., 2003). It is agreed that the values of the average variance extracted - assessing the amount of variance captured by a construct’s measure relative to random measurement error - of .50 or above provide further evidence of the internal consistency of a construct’s measure (Fornell & Larcker, 1981). For both types of sites, all the average variance extracted estimates achieved this criterion on the first-order construct level. In addition the average inter-item correlation ranged from .58 (.65) to .87 (.83) across the two sources of eWOM. On the second-order level, evidence for internal consistency was provided by adequate levels of construct reliability and AVE. The construct reliability measures exceeded the .80 threshold (Netemeyer et al., 2003) with values of .91 in both cases and also the .50 cut-off value (Fornell & Larcker, 1981) (.68/.66). Taken collectively, item loadings, composite reliability, variance extracted, and item- to-total correlations provided support for the internal consistency of the eWOM trust measure on its diverse construct levels.

Convergent validity was assessed by following Bagozzi and Yi’s (1988) recommendation that all items should load considerably on their hypothesized dimensions, and that the estimates are positive and significant. Table 45 presents factor loadings of all items, which turned out be greater than .70 (.51), and a review of the t-tests for the factor loadings (greater than twice their standard error) suggested that they all were significant. A similar pattern was observed for the relationship between the first- and the second-order constructs, where the completely standardized loadings ranged from .67 (in both samples) to .97 (.99), were significant at the .01 level, and the parameter estimates were 10 to 20 times as large as the standard errors (Anderson & Gerbing, 1988). Inspection of the squared factor loadings on the individual item as well as first-order level revealed that the majority of the relationships were equal to or better than .50. In addition, correlations among the five sub-dimensions of eWOM trust were considerable, indicating high levels of convergence among the dimensions to measure overall eWOM trust. These results added up to the conclusion that the new eWOM trust scale was able to measure reliably as well as validly consumer trust in diverse forms of eWOM sources. As the second part of the research question emphasized the level of trust in customer reviews on CDS compared to MDS, a paired-samples t-test was deemed appropriate to evaluate potential differences. The results indicated that the trust level was not significantly higher for CDS (M = 2.81, SD = 1.09) than for MDS (M = 2.74, SD = 1.05), t(175) = 1.38, p > .05. These results suggested that while consumer-developed sites were found to posses a slightly higher level of consumer trust, the difference in means between the two types of eWOM sources is unremarkable, implying that online reviews are equally trusted. A further investigation comparing individual review and commercial websites (e.g., Epinions) could provide a more detailed insight.

RQ 9: Are consumer perceptions of (a) eWOM platforms’ credibility and (b) the credibility of reviewers on these platforms associated with trust in eWOM on that platform?

Two multiple regression analyses were used to test if platform credibility and reviewer credibility significantly predicted respondents’ level of eWOM trust. The results of the first regression concerning trust in online reviews on CDS indicated that the two predictors explained 63.60% of the variance (R^{2}Adj = 63.20, F(2,173) = 151.24, p < .001). It was found that perceived platform credibility significantly impacted eWOM trust (p_{s} = .42, p < .001), as did perceived review credibility on these sites (p_{s} = .44, p < .001). A similar result was observed in the context of MDS. Here, platform credibility (p_{s} = .58, p < .001) and reviewer credibility (Ps = .24, p < .05) likewise predicted eWOM trust on such sites. The two variables explained a significant proportion of the variance in eWOM trust scores, R2Adj = 62.30, F(2,173) = 145.63,

Table 45: Psychometric Properties of the eWT-S (Alternative Online Sources)

Notes: Table shows results for consumer-driven sites (CDS); Results for the marketer-developed sites (MDS) in parentheses; у = Completely standardized second-order loading; X = Completely standardized first-order loading; AYE = Average variance extracted.

This research does not assume that eWOM trust as a consumer’s generalized tendency to rely on online customer reviews is able to perfectly predict his/her response to every individual review. For the impact of individual reviews, contextual variables such as diverse perceptions of the review (e.g., wording), the reviewer itself (e.g., perceived competence, credibility) and situation-specific personal conditions of the reader (e.g., mood) become pivotal. But, in principle, generalized trust in eWOM should at least determine tendencies. Therefore, where the eWOM trust scale is able to forecast reactions to a series of customer reviews, additional evidence for the scale’s criterion validity and function is provided. Accordingly, the following two research questioners were asked:

RQ 10: Is eWOM trust in general significantly and positively related to eWOM trust in individual customer reviews?

RQ 11: Is eWOM trust in general significantly and positively related to attitude towards individual customer reviews?

A convenience sample of students from two universities was asked to complete a paper-based questionnaire, including the eWOM trust scale, in class for course credit. Three weeks later, they received an email in which they were asked to visit a website containing a series of consumer reviews and to respond to several questions targeting their attitude towards and trust in the reviews just read. Respondents’ attitude towards the individual reviews was assessed with six 7-Point semantic differentials adopted from earlier research (Bezjian-Avery et al., 1998; Mitchell & Olson, 1981; Muehling, 1987; Olney et al., 1991) (e.g., “The reviews just read are ... good - bad”, convincing - not convincing’). In contrast, situational trust was

measured with a single Likert-like item measure (“I trust the information given in these reviews.”) ranging from 0 (strongly disagree) to 6 (strongly agree). It was expected that these items would represent responses to online customer reviews which should be related to eWOM trust. Additionaly, a four-item Likert scale for product involvement (Zaichkowsky, 1994) and a single question evaluating product category knowledge (Lee et al., 2008) were included. In order to control for response bias, subjects were requested to indicate prior knowledge of the reviews and familiarity with the brands discussed. Each respondent was exposed to a series of four review sets, each one consisting of four reviews. The number of reviews (4), as well as their length (average review length: 50 words), corresponded to regular review consumption behaviors (BrithtLocal, 2013). The review sets discussed four fictitious products/brands (i.e., for a digital camera, a notebook, a hotel, a MP3 player). These products were selected because of three criteria: first, the products should have at least some relevance to the respondents; second, respondents should also use customer reviews at least sometimes when making purchases in the investigated product category; third, both experience and search goods should be included. In a pretest, it was verified that these two conditions were fulfilled for the selected products. The review sets were displayed sequentially. The order of exposure was randomly selected to control for order effects. Participants were instructed to read the reviews for each product in the intensity and duration they regularly are used to. After that, they were asked to respond to the scales described above before the next review set was presented.

If eWOM trust is a suitable predictor of responses to individual customer reviews, then the correlation between the eWOM trust scale and the response measures should be significant and positive, given that higher ratings in each of these scales indicate more generalized trust and more favourable responses respectively. Table 46 reports the relationships. The correlation between eWOM trust and average trust in the reviews was considerable and significant (r = .66, p < .001). The positive relationship manifested itself in all four review sets. Correlations were large (Cohen, 1992), ranging from .53 to .60, and were all significant on the .001 level. A similar relationship was identified between eWOM trust and average attitude towards the reviews (r = .58, p < .001), as well as the attitude towards the individual reviews (correlations ranging from .48 to .53; all p < .001). Results suggest that respondents who are willing to trust customer reviews in general found the reviews, on average, more favourable and more trustworthy. This provides further empirical evidence for the predictive power of the eWT-S. That is, situational influences (e.g., cues) can only to some extent predict consumers’ perceptions, attitudes and reactions towards OCR.

Table 46: Influence of Generalized eWOM Trust on Perceptions of Individual Reviews

The eWOM trust scale was developed in German, with samples from both Austria and Germany. To test the extent to which the proposed measurement approach is stable and generalizable across alternative languages and cultures, the following research question was advanced:

RQ 12: Is the eWOM trust scale applicable to different languages and cultures?

In order to provide first insights into this research topic, the thesis at hand investigates the applicability of an English version of the new scale, using a sample of US consumers. While it is recognized that insights from this research can only be regarded as first evidence for a crosscultural and cross-lingual application of the scale, testing it in additional contexts was beyond the scope of this research and remains open for future investigation. However, testing the robustness of the eWOM trust scale in the United States was appealing due to several considerations: first, the United States offers an interesting study setting, as US consumers are increasingly turning to online customer reviews (BrightLocal, 2013); second, the United States is to some degree culturally different from Austria and Germany (i.e., Kogut and Singh’s (1988) Cultural Distance Index (CDI)); third, a valid English version of the scale would enable a relatively large research community to apply the new scale in their research. This said, data discussed in this section stems from a representative sample of US online consumers (sample 6) and was collected by the procedures and research instruments described in Chapter 4 earlier. The final sample was composed of 517 consumers using the Internet, of which 53.2 % were females. The average age was 43.9 years (SD = 15.0) and ranged from 18 to 74 years (for details see Chapter 4).

Collected data on the 22 scale items was first subject to an exploratory factor analysis using principal components analysis with a Promax oblique rotation method. In order to test for similarities in data structure, the number of factors to extract was set to five. This extraction attempt resulted in a solution that deviated from the earlier findings in that a single item (Be2, “social”) loaded on an individual factor. This was perhaps due to variability in the term’s meaning in the two cultural hemispheres. Accordingly, the item was regarded as being biased and was deleted from the item set. What followed was a re-estimation of the exploratory factoral model for the remaining 21 items. The results of this procedure are provided in Table 47.

The same structure of the scale’s first-order constructs emerged, mirroring the earlier findings. The five factors explained 77.73% of the total variance in the data (with the integrity/honesty dimension explaining almost 60%) and an a-posteriori parallel test showed that a five-factor- solution was most appropriate. All 21 items loaded considerably and significantly (with loadings above .46) on their hypothesized factors and showed negligible cross-loadings with the other dimensions. The majority of items had an item loading in the .80’s with acceptable communality levels (> .71), providing first evidence for the robustness of the scale in alternative languages/cultures (at least on the fist-order construct level). Before progressing to confirmatory factor analysis, the item set was also subject to a review of the individual items’ properties (e.g., item means and variances, theoretical range vs. actual range) in accordance to scale development guidelines (e.g., DeVellis, 2012). However, this analysis was not able to identify problematic indicators and the investigation into the generalizability of the new scale was continued.

Table 47: Results of the EFA (Sample 6)

Sub

dimension

Variance-

Explained/

Eigenvalue

Item

Factor Loading

Communality

MS4

Ability

4.00% / .84

Ab7

Ab8

Ab9

Ab10

Ab11

.76

.73

.80

.69

.54

.81

.71

.83

.74

.76

.96

.98

.96

.98

.98

Integrity/

Honesty

59.06% / 12.40

In2

In3

In4

In5

In6

In9

In10

.75

.59

.85

.87

.80

.60

.88

.77

.68

.82

.78

.78

.76

.79

.98

.98

.97

.98

.97

.98

.98

Benevolence

3.52% / .74

Be1

Be3

.81

.86

.84

.81

.93

.94

Willingness to rely

7.26% / 1.73

Wi1

Wi4

Wi5

Wi8

.92

.84

.80

.91

.80

.80

.75

.78

.96

.94

.95

.97

Willingness to depend

2.89% / .61

Wi2

Wi6

Wi7

.71

.46

.58

.73

.78

.80

.98

.96

.96

Notes: Total variance explained: 77.73%; Extraction method: Principal Component Analysis; Rotation method: Promax with Kaiser Normalization; Rotation converged in 9 iterations; Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) .97 and Bartlett’s test of Sphericity: sig. .001.

To test the reliability and validity of the extracted dimensions, as well as the theorized structure of the proposed scale, a confirmatory factor analysis (CFA) with LISREL 8.5 was used. Here, a second-order CFA measurement model with the five first-order trust factors was specified, similar to the earlier conceptualizations. The scale exhibited satisfactory psychometric characteristics similar to the earlier findings. While the chi-square test was again significant (x^{2 }= 701.11, df = 184, p < .001), the absolute, incremental and parsimonious fit measures revealed an adequate representation of the data by the model (GFI = .94, AGFI = .91, RMSEA = .07, SRMR = .05; CFI = .95, NNFI = .95, NFI = .93; normed chi-square: 3.8, PFI = .70). This suggested that the data fitted a higher-order model well.

As shown in Table 48, results indicated a good internal consistency of the measure on its multiple levels. To assess the reliability of the observable items as indicators of the first-order constructs, all path coefficients turned out to be substantial (ranging from .60 to 91), positive, and significant, with t-values associated with the correlations that exceeded the critical value (2.75) for .01 significance. Additional support for the existence of the hypothesized relationships on the first-order construct level was provided by an examination of the squared multiple correlation (R^{2}) (Bollen, 1989). Fornell and Larcker (1981) suggest that an R^{2} equal to or above .50 indicates reliability of the measure, as a majority of the variance in the (observable) measure can be attributed to its intended latent variable. The majority of the items showed an R2 well above the recommended threshold, with most items ranging in the .70s and .80s. However, a single indicator (Ab8) fell short in this measure (.40) and also showed a weaker (but nevertheless considerable and significant) factor loading (.60).

On the first-order level, all coefficient alphas of the individual sub-dimensions surpassed the .80 level by ranging from .85 to .94 and achieved satisfactory levels of construct reliability. Here, the construct reliability of the benevolence dimension was the lowest, however, showing a value of .84, which was again considerably above the recommended .70 threshold. The AVE of the five sub-dimensions also met the .50 minimum level by ranging from .65 to .73. The review of path coefficients also indicated positive and significant loadings between the first- and the second-order constructs. Additionally, the corresponding squared multiple correlations turned out to be equal to or to surpass the recommended .50 threshold (range: .50 to .86). The reliability of the first-order constructs as indicators of the higher-order eWOM trust construct was evaluated in line with previous discussion by reviewing the construct reliability (.94) and AVE (.74), which indicated a reasonable internal consistency across the five aspects of trust. Taken together, the results provided reasonable evidence for the scale’s reliability.

According to Bagozzi and Yi (1991) and others (Anderson & Gerbing, 1988; Fornell & Larcker, 1981; MacKenzie et al., 2011), weak evidence for convergent validity is given by significant correlation between an item and its hypothesized (higher-order) latent construct. As previously discussed, all factor loadings (X) were equal to or greater than .60 and showed t-values greater than 2.57 (ranging from 15.21 to 27.26), implying a significant loading of sub-dimensions on its observable indicators on the .01 significance level. The relationships between the second- order construct and the first-order constructs showed a similar pattern, supporting the second- order hypothesized structure. A more demanding indicator for the existence of convergent validity is the squared factor loading, for which it is recommended that a threshold of .50 should be exceeded. 20 of the 21 items had squared factor loadings (X^{2}) above this threshold. Only Ab8 fell short on this criterion. However, all five first-order constructs turned out to have a squared factor loading (y2) ranging from .50 to 86. Convergent validity can also be assessed in terms of the extent to which the first-order constructs are correlated (Bagozzi and Yi 1991). All correlations turned out to be considerable (ranging from .58 to .93) and significantly related (p

< .01), suggesting that they converge into a common underlying construct. All these analyses (including the satisfactory AVEs on all levels) yield evidence that the English version of the eWOM trust scale possessed satisfactory convergent validity for measuring consumer trust in online customer reviews.

By now, one was able to argue for appropriate (convergent) validity, as well as reliability, of the adopted scale. However, it is also advisable to test its robustness. In order to ensure that the reported results were not biased by the idiosyncrasy of the development samples and that the measure is applicable across different languages, the scale’s invariance (i.e., equivalence) was successionally tested. This research’s approach follows the suggestions put forward by Doll et al. (1998), who advocate that testing for invariance across samples is a particularly demanding but insightful test of the instrument’s robustness. While the assessment of a measure’s invariance regularly provides important implications for its construct validity and hence is increasingly demanded by scale development guidelines (e.g., Netemeyer et al. 2003), its application in literature is sparse (see Delgado-Ballester, 2004 for a notable exception; Netemeyer et al., 1996).

The basic idea behind this test is that the relationships between the observed scores and the latent variables shall be identical across independent groups. If the psychometric properties are similar across the two groups (e.g., Austrian/German consumers (sample 4; n = 526) vs. US consumers (sample 6; n = 517)), the eWOM trust scale is said to have equivalent meaning. This consequently enhances the generalizability of the scale (Bollen, 1989; Marsh, 1995; Steenkamp & Baumgartner, 1998). For this test, the same measurement model was applied to the data from the two language groups in order to simultaneously assess the model’s equivalency across these groups by constraining different sets of model parameters to be equal in both groups (Worthington & Whittaker, 2006). Specifically, the measurement invariance was evaluated hierarchically. A group-model (Model 1) - for which parameters were estimated across both groups - was used as a baseline for comparison with subsequent models (Model 2-4) in the hierarchy. Table 49 presents the fit estimates for this model.

Configurational invariance for the 21 -item scale was established, as the baseline model showed acceptable fit (x^{2} = 1,334.42 (df = 368, p < .001), RMSEA = .07, CFI = .95, NNFI = .95, NFI = .93). Subsequently, a model in which the factor loadings were constrained to be equal across both groups was estimated (Model 2). This model also yielded acceptable fit (x^{2} = 1,355.81 (df = 384, p < .001), RMSEA = .07, CFI = .95, NNFI = .95, NFI = .93). A chi-square difference test between this constrained model and the unconstrained baseline model performed to be nonsignificant (x^{2}Diff = 21.40, dfDiff = 10, p > .05), implying full metric invariance across the samples.

285

Sub

dimension

У

Item

X

Squared

Multiple

Correlation

Corrected

Item-to-Total

Correlation

Average Inter-Itern Correlation

Cronbach’s Alpha

Construct

Reliability

AVE

Construct

Reliability

AVE

Ability

.93

АЫ Ab8 Ab9 AblO Abl 1

.87

.60

.86

.82

.85

.76

.40

.74

.66

.73

.81

.56

.81

.78

.78

.64

.90

.90

.65

.94

.75

Integrity/

Honesty

.92

In2

In3

In4

In5

In6

In9

InlO

.87

.78

.90

.84

.87

.81

.85

.75

.61

.81

.71

.75

.66

.71

.83

.76

.87

.81

.81

.79

.83

.71

.94

.95

.71

Benevolence

.71

Bel

Be3

.91

.80

.82

.64

.73

.73

.73

.85

.84

.73

Willingness to rely

.83

Wil

Wi4

Wi5

Wi8

.87

.88

.82

.85

.76

.77

.67

.71

.81

.82

.77

.79

.72

.91

.91

.73

Willingness to depend

.92

Wi2

Wi6

Wi7

.70

.88

.88

.50

.78

.78

.64

.76

.77

.66

.85

.87

.69

Table 48: Psychometric Properties of the eWOM Scale (Sample 6)

Notes: у = Completely standardized second-order loading; X = Completely standardized first-order loading; AVE = Average variance extracted.

For Model 3, factor loadings and factor variances were set equivalent. Here, the chi-square difference test with the baseline model was significant (x^{2}Diff = 81.94, dfDiff = 21, p < .001), suggesting that there is some non-chance lack of invariance. However, it has to be noted that the same limitations apply to these results as to any other statistical test evaluating confirmatory models. This means that “invariance constraints are a-priori false when applied to real data with a sufficiently large sample size” (Marsh, 1995, p. 12). Therefore, the invariance should be investigated also by considering alternative fit indices. If, according to these indices, the model fits the data satisfactorily, sufficient evidence for model invariance exists (Marsh, 1995; Marsh & Hocevar, 1985). In line with this argumentation, it was concluded that due to adequate fit levels of Model 3, the factor loadings as well as factor variances were classified equal in both samples.

The last model estimated (Model 4) assumed the equivalence of factor loadings, factor variances, and error terms. Again, this model was compared with the baseline model. The difference in the chi-squares was x^{2}Diff = 263.01, dfDiff = 42 and significant (p < .001). However, the levels of fit for this (considerably) constrained model were acceptable (x^{2} = 1,597.42 (df = 410, p < .001), RMSEA = .07, CFI = .94, NNFI = .95, NFI = .92), suggesting invariance across the two samples. Simultaneously, the CAIC did not decrease considerably across the four models, indicating comparable model adequacy. In summary, some evidence for measurement invariance across the two languages/cultures existed which built further confidence in the generalizability of the new scale.

Table 49: Measurement Invariance (Sample 4 and 6)

Model

Description

Chi

Square

df

P

Competing Baseline Model

Model Fit Indices

Chi

Square

Difference

df

Difference

Sign.

RMSEA

CAIC

NFI

NNFI

CFI

1 Baseline

1,334.42

368

.001

n.a.

n.a.

.07

2,082.78

.93

.95

.95

2 Factor loadings invariant

1,355.81

384

.001

21.40

16

n.s.

.07

2,010.51

.93

.95

.95

3 Factor

loadings and factor variances invariant

1,416.36

389

.001

81.94

21

***

.07

1,997.53

.93

.94

.95

4 Factor loadings, factor variances, and variance of error terms invariant

1,597.42

410

.001

263.01

42

***

.07

2,011.41

.92

.94

.94

Notes: *** = p <.001; n.s. = not significant; n.a. = not available.