A Case Study of Market Segmentation using Cluster Analysis

To illustrate how market segments are identified and mapped using a coupling of cluster analysis and GIS, a case study of Dollarama is presented. Dollarama is the largest dollar store chain in Canada. As of January 2017, it operated 1,095 stores across the country. In the retail industry, dollar stores are known as the “extreme value retailer”. Unlike convenience stores that are typically small in size, the modern dollar stores are much bigger, carrying a large assortment of merchandise. They aim to meet basic and daily household needs, and focus on frequently used and replenished goods (such as consumables, packaged food/perishables/snacks, health and beauty products), and, to a limited extent, apparel. They also aim to provide a fun, exciting treasure hunt experience. All dollar stores target low- and fixed-income households, including ethnic minorities and new immigrants. Capitalizing on the 2008 recession, the Dollarama stores now also target middle-income consumers. Given their low price points ($l-$4), they are exclusively bricks-and-mortar stores with no online sales and no home delivery (except bulk purchase of party supplies). In this case study, the statistical method of cluster analysis is used to identify the market segments in the Toronto CMA that fit Dollarama’s consumer profiles (Ware, 2017; LeBlanc, 2018).

Cluster analysis is a method of data partition and classification, grouping similar cases into a class that is different from other classes. It is also a data reduction method. However, unlike principal component analysis and factor analysis, which group similar variables (i.e., columns in a data set) into factors, cluster analysis groups similar cases (i.e., rows in a data set) into a smaller number of classes, known as clusters. Each resulting cluster is a group of relatively homogeneous cases (or observations) in the data set. On the basis of combination of classification variables, cluster analysis maximizes the similarity of cases within each cluster, while maximizing the dissimilarity between clusters. Cases in each cluster are similar to each other, and are dissimilar to the cases in other clusters.

There are two types of cluster analysis: hierarchical cluster analysis, and K-means cluster analysis. The former is used when the researcher has no “idea” of how many clusters may exist in the data set. The latter is used when the researcher has a hypothesis about the number of clusters in the data set. In both types of cluster analysis, the method depends on the classification variables being used. It is advised that the data set should contain at least four classification variables, because fewer than four variables result in meaningless clusters. Depending on data sets, the biggest challenge is to identify the optimum number of clusters: too large a number results in many small clusters that do not have much differentiation from each other, while too small a number results in large clusters that may hide important spatial variations due to a high level of generalization. One of the important characteristics of the AT-means cluster analysis is its ability to choose the number of clusters desired. Still, the decision is often made with a degree of arbitrariness and after several rounds of experiment, or trial runs, using different values of K. (For a thorough understanding of cluster analysis, students are advised to consult a multivariate statistics textbook.)

The target markets of Dollarama in the Toronto CMA are profiled using the Amieans cluster analysis method. Eight classification variables are selected from the 2011 Canadian census at the census tract level of geography. These are:

  • • median household income (in Canadian dollars)
  • • number of households with household income under $50,000
  • • percentage of households living in subsidized housing
  • • prevalence of low income (i.e., percentage of households living below official poverty line)
  • • number of unemployed persons
  • • average dwelling value
  • • number of home renters
  • • number of recent immigrants (living in Canada for less than five years at the time of the census)

Median household income and average dwelling values are two most obvious indicators that separate the “less fortunate” neighborhoods from the wealthy neighborhoods. Households that have aggregated income under $50,000 (compared with the median household income of $70,365 for the entire CMA), especially those below the official poverty line, and households that live in subsidized housing are all characteristics of consumers who are likely to buy merchandise from a

Dollarama store. Unemployed persons who have lost their steady income also fit the profile of the Dollarama consumers, though their unemployment status is not necessarily permanent, and many of them re-enter the labor market at some point in time after the census. Recent immigrants are included on the assumption that their income is low and they tend to be price conscious for daily use articles and consumables. It is well documented in the literature that new immigrants are in a period of transition in life and employment, and they struggle to find a job that matches their education credentials and occupation skills in the first five years in Canada. Although some of them are able to purchase a million-dollar house and live in a tony neighborhood, they could be real estate rich but income poor, because the money they had to purchase the house was brought from their home country, not earned in Canada from regular wages or salaries.

The cluster analysis is performed using SPSS (Statistical Package for the Social Sciences). With the eight classification variables, a A-means cluster analysis is trial-run four times, using K=3, K=4, K=5, and K=6, respectively. It is found that K=5 gives the most distinguishable clusters, with maximum distances between the cluster centroids, which are the means of the cluster scores for the individual census tracts of each cluster.

A new column of Cluster_ID is added to the census dataset, in which a unique ID number is assigned to each of the 1,074 census tracts, linking target market groups to specific areas in the Toronto CMA. Summary and spatial statistics are then calculated including the number of census tracts in each cluster. Demographic, social and economic data for each cluster are also derived. The final cluster centers for each classification variable are presented in Table 3.2. The five clusters, or market segments, are labeled, respectively, as Price Conscious, Middle-Class Thrifty, Middle-Class Well-Off, Rich and Wealthy, and Ultra Rich (LeBlanc, 2018).

Cluster 1, or the Price Conscious segment, which consists of 468 census tracts (out of a total of 1,074 in the entire CMA), should be the primary target market for Dollarama. Clearly, this cluster has:

  • • the lowest median household income ($64,100)
  • • the lowest average dwelling value ($345,800)
  • • the largest number of households with income less than

$50,000 (743)

• the highest percentage of persons living below the official poverty line (18 percent)

Table 3.2 Final Cluster Centers for Dollarama Market Segments

Classification variable

1 Price Conscious

2 Middle-Class Thrifty

3 Middle-Class Well-Off

4 Rich and Wealthy

5 Ultra Wealthy

# Census tracts






Median HH income (1,000S)

64.1 (17.8)

79.5 (22.6)

92.6 (30.1)

114.6 (36.6)

185.1 (34.8)

Average dwelling value (1,000$)

345.8 (69.9)

507.2 (52.4)

735.7 (80.3)

1,092.2 (138.6)

1,789.8 (253.3)

# HH under S50k of income

743 (467.6)

596 (440.5)

511 (378.5)

398 (262.0)

140 (152.4)

Prevalence of low income (%)

18 (9.4)

14 (7.2)

12 (6.9)

10 (5.4)

7 (2.2)

% HH in subsidized housing

14.1 (17.2)

11.1 (16.9)

6.96 (14.2)

3.05 (6.7)

3.88 (9.5)

# Renters

663 (655.7)

530 (644.8)

541 (626.8)

508 (423.0)

177 (129.2)

# Unemployed persons

269 (113.0)

240 (116.3)

195 (88.5)

152 (67.9)

101 (62.8)

# Recent immigrants

436 (372.8)

326 (304.5)

228 (196.2)

182 (156.1)

73 (29.8)


HH = household.

* Numbers in brackets are standard deviation.

  • • the highest percentage of households living in subsidized housing (14.1 percent)
  • • the largest number of home renters (663)
  • • the largest number of unemployed persons (269)
  • • the largest number of recent immigrants (436)

The secondary target market for Dollarama should be Cluster 2—the Middle-Class Thrifty segment. This cluster of 429 census tracts has:

  • • the second lowest median household income ($79,500)
  • • the second lowest average dwelling value ($507,200)
  • • the second largest number of households with income less than $50,000 (596)
  • • the second highest percentage of persons living below official poverty line (14 percent)
  • • the second highest percentage of households living in subsidized housing (11.1 percent)
  • • the third largest number of home renters (530; fewer than Cluster 3)
  • • the second largest number of unemployed persons (240)
  • • the second largest number of recent immigrants (326)

The other three clusters, which account for 16 percent of all the census tracts in the CMA, contain fewer residents and households that fit Dollarama’s consumer profile.

Figure 3.1 maps the spatial distribution of the five clusters, along with the existing 156 Dollarama stores (as of 2016). It is evident that Dollarama has indeed focused heavily on the Cluster 1 “neighborhoods” in store deployment; it has also been extending its market reach to the Cluster 2 neighborhoods, as the retailer stated in its annual reports (Dollarama, 2018). Specifically, 87 (or 56 percent) of the 156 stores are located in (or on the border of) the Cluster 1 census tracts, which are home to 44 percent of the CMA’s households; another 46 stores (29 percent) are in (or on the border of) the Cluster 2 census tracts, which contain 41 percent of the CMA’s household; only 23 stores are in the census tracts of the other three clusters.

Of the eight classification variables, median household income and average dwelling value make the most contributions to separating the clusters. Percentage of households living in subsidized housing and number of renters seem to make the least contribution, because their standard deviations are larger than the cluster means, which suggests that there are wide ranges in the values of these variables within the five clusters.

Dollarama Market Segments and Store Locations in the Toronto CMA, 2017

Figure 3.1 Dollarama Market Segments and Store Locations in the Toronto CMA, 2017.

The cluster analysis-based geodemographics is a useful tool for the study of the geography of demand. The same method can be used to identify target markets for other types of retailers and commercial service providers, but with a different set of classification variables. The choice of classification variables depends on the type of business, but also on the knowledge of the market analyst or the retail geographer who performs the market segmentation.

< Prev   CONTENTS   Source   Next >