Geo-Marketing, a New Approach Using Fuzzy Clustering

Lorenzo Mori and Duccio Stefano Gazzei

Introduction

Constant economic evolution, the discovery of new technologies and the increasing importance of the customer in recent decades have led to rapid developments in marketing. Marketing is not a new science. As Jones and Shaw (2002) wrote, it has existed since ancient times but has become a great global challenge for companies that spend millions every year on advertising. The paradigm has also changed, as the priority today is not selling but, rather, understanding what clients want and creating custom products for everyone. From this perspective, a lot of work has recently been done.

Marketing techniques encompass many subjects. Neuroscience yielded neuro-marketing (see Lee, Broderick and Chamberlain, 2007), which focuses on channels for emotional communication. Out of digitisation grew digital marketing, which played a decisive role in the development of e-com- merce. Others include relationship marketing, buzz marketing and guerrilla marketing.

In this chapter, we are mainly interested in geo-marketing, which uses geo-location to plan and implement marketing activities (see Cliquet, 2006). Interest has grown in techniques to determinate the particular geographic zone where potential consumers live, work or do something else. Among the most important is the Voronoi cell, also known as Thiessen polygons, which contain all the points closest (in terms of distance) to a particular place. Recently, Azri, Ujang and Abdul-Rahman et al. (2016, 2020) have addressed the use of 3D geo-marketing segmentation with 3D Voronoi cells to deal with the locations of vertical components in high- rise and multi-storey buildings, specifically cluster urban areas from a new multi-dimensional perspective - for example, the placement of a shop within a multi-level marketplace. Another well-known way to determine the basin of gravitation by people is an isochrone map, which is generally used to define accessibility to a certain location in terms of time; for an example, see Baltyzhakova and Bryzhataya (2019). This is a critical point for highly competitive shops.

Cluster number

Education

Wealth

Age

ID

1

1

1

1

Tycoon

2

1

1

0

Cultural elite

3

1

0

1

Disillusioned

4

1

0

0

Young innovator

5

0

1

1

Affluent

6

0

1

0

Functional

7

0

0

1

Distant

8

0

0

0

Disengaged

It is also possible to find fuzzy applications for geo-marketing. Hsu, Chu and Chan (2000), trying to solve the problem of traditional segmentation, which assigns a single membership, presented an application of c-means fuzzy clustering to determine market segments. In India, Jain and Krishnapuram (2001) used fuzzy sets for the personalisation of e-commerce and demonstrated the advantages of employing a fuzzy approach to marketing, especially in determining targets. Grekousis and Hatzichristos (2013) applied unsupervised fuzzy clustering with a Gustafson-Kessel algorithm (1978) for the delineation of geo-marketing regions in the metropolitan region of Athens, Greece.

In this chapter, we use consumer data from a big Italian bank with a simple but efficient technique to divide customers into different clusters. Eurisko segmentation logic1 is a method that enables us to cluster the population using socio-demographic information. Many different subdivisions can be used, but with respect to banks we are interested in clustering the population based on only three dimensions: wealth (IW), education (IE) and age (IA). For example, tycoons are rich, older and highly educated. In this way, it is possible to determine what consumers want and how they want to be approached. This logic allows us to define these dimensions for each consumer, such that it equals 1 if it is high, and 0 if it is low. We obtain eight groups, as seen in Table 18.1.

In the following discussion, we suggest an alternative approach to geomarketing. We try to design a method which does not dichotomise the population and has more flexible definitions of wealth, education and age. In particular, our goal is a predictive model for an area about which we do not know very much or in which we do not already have many customers.

A Fuzzy Approach to Geo-Marketing

Several steps are necessary to obtain the three dimensions presented in the Eurisko segmentation using a fuzzy method (Betti and Verma, 2008; Betti and Lemmi, 2013; Betti et al., 2015; Betti et al., 2016; Betti, 2017; Bettio et al., 2020).

  • 1. identification of the variables to include in the analysis;
  • 2. transformation of the variables into the interval [0,1];
  • 3. exploratory and confirmatory factor analysis;
  • 4. calculation of weights within each dimension (each group);
  • 5. construction of the three fuzzy measures.

Identification of the Variables

To define the three dimensions, we begin with data from the census division of Rome. We restrict our study to the census area within the square built around the Grande Raccordo Anulare, the A90 (literally, the Great Ring Junction).

We selected 15 variables for the dimension of wealth, 11 for the dimension of education and 5 for the dimension of age. Some of the variables are continuous and others are discrete. All these variables are in terms of the average in a household in the census area.2

Transformation of the Variables into the Interval [0,1]

Calculating a fuzzy index requires the determination of a membership function to transform each variable into the interval [0,1]. For continuous variables, the membership function is a cumulative distribution function, in which j is the category of the variable x for individual i, and F.(x) is the value of a cumulative function of / for i. In order to transform the positive score to a deprivation score we consider the complement to one that is,

Discrete variables are divided into classes (categorical value), and the membership function is as follows:

where C.(.is the value of category j for i, and F(c.;) is the value of the cumulative function of j for i and F(l) is the value assumed for the highest category. To obtain the deprivation score from the positive ones, we use the following equation:

Commonly, if the variable is dichotomous, the deprivation index d is 1 for deprivation and 0 otherwise, whereas a positive score s is 0 for deprivation and 1 otherwise.

Dimension

Variable

Wealth

1 Square metres

2 Income

3 Consumption

4 Postal savings

5 Bank accounts

6 Bank deposits

7 Securities accounts

8 Funds

9 Asset administration

10 Insurance policies

11 Asset management

12 Deposit and savings accounts

13 Current bank accounts

14 Debit card

IS Credit card

Education

1 Degree"'

2 High school

3 Junior high school

4 Elementary school

5 EET (employed-educated-trained)

6 Housewife/Husband

7 Looking for work

8 Looking for a first job

9 NEET (not-employed-educated-trained)

10 Employed

11 Students

Age

1 Number of members

2 Unmarried

3 Widowed

4 Divorced

5 Average age of members

’''Discrete variables. * * Variables in italics have a negative score.

Some variables that should be discrete (e.g. debit and credit cards) are considered continuous here because we are using an average value, so the choice as to whether to bring back them in class is determined by their distribution. It is also important to stress that the negative or positive score was assumed and then confirmed using factor analysis.

In Table 18.2, we report all the variables to clarify which are assumed to have a positive or negative score.

Exploratory and Confirmatory Factor Analysis

To construct a summary index, we first have to identify and investigate the latent group of geographical variables. To achieve this objective, we

Scree plot

Figure 18.1 Scree plot.

use exploratory and confirmatory analysis. Figure 18.1 shows how many groups were ‘suggested’ by the exploratory analysis.

The method in the acceleration factor, which is a mathematical method for determining where the slope of the curve changes most, suggests that the number of groups might be only one.

It is also possible to use other criteria to choose the number of groups.

  • Parallel analysis: also known as Horn’s parallel analysis (1965), it compares the eigenvalues generated from the data matrix to the eigenvalues generated from a Monte-Carlo simulated matrix of the same size. It suggests six groups;
  • Optimal coordinates: this method determines the location of the scree by measuring the gradients associated with eigenvalues and their preceding coordinates. It suggests six groups;
  • Eigenvalues: the number of groups, if this is the method followed, equals the number of eigenvalues greater than their mean. It suggests seven groups.

The solution could be very different if we consider one method instead of another. Nevertheless, the standardised loadings (pattern matrix) based upon the correlation matrix show that many variables do not clearly belong to a unique group.

In order to achieve our goal of reproducing the three Eurisko indicators in a fuzzy way, we evaluate a confirmatory factor analysis (CFA) with the parameter in which the number of clusters was three. We summarise the results of this procedure.

Table 18.3 reports the following index: [1]

Estimator:

ML

Optimisation method

NLMINB

Number of free parameters

65

Number of observations

7156

User model versus baseline:

Comparative fit index (CFI)

0.616

Tucker-Lewis index (TLI)

0.586

Root mean square error of approximation:

RMSEA

0.242

P-Value RMSEA < = 0.05

0.000

Standardised root mean square residual:

SRMR

0.095

  • • The Tucker-Lewis index (TLI), also known as a non-normed fit index, analyses the discrepancy between the chi-square value of the hypothesised model and the chi-square value of the null model, which has a range between 0 and 1. A value of k% indicates that the model of interest improves the fit by k% relative to the null model;
  • • Root mean square error of approximation (RMSEA) is based on an analysis of the residual, and a good fit is indicated by a low value;
  • • Standardised root mean square residual (SRMR) is the square root of the difference between the residuals of the sample covariance matrix and the hypothesised model, with low value indicating a good fit.

Obtaining a good fit enables us to continue our analysis with a hypothesis

on three groups, as defined in the first paragraph.

Calculation of Weights Within Each Group

To obtain the final three indicators, we have to calculate weights. To do this, we adopted the approach proposed by Betti and Verma (2008), in which the weights are given by the product of two factors: the dispersion of variables and their correlation with other variables in each index.

In particular, for the dimension of wealth, we use the inverse of variable coefficients to give greater importance to rare things (Table 18.4).

Construction of the Three Fuzzy Measures

Step 5 uses the weights found in step 4, so we use the following equation to obtain the three indicators.

Table 18.4 Weight of variables

Dimension

N

Variable

W

a

W

Wealth

1

Square metre

1.486

0.120

1.493

2

Income

1.510

0.077

0.969

3

Consumption

1.512

0.077

0.973

4

Postal savings

1.533

0.078

1.000

5

Bank accounts

1.499

0.074

0.926

6

Bank deposits

0.763

0.183

1.166

7

Securities

1.472

0.076

0.938

8

Cash

1.466

0.076

0.925

9

Asset administration

1.484

0.075

0.928

10

Insurance policies

1.480

0.075

0.922

11

Asset management

1.481

0.074

0.920

12

Deposits and savings accounts

1.526

0.080

1.022

13

Bank accounts

1.502

0.074

0.928

14

Debit card

1.509

0.077

0.964

15

Credit card

1.484

0.075

0.926

Education

1

Degree

0.206

0.243

0.134

2

High school

0.386

0.289

0.301

3

Junior high school

0.210

0.226

0.127

4

Elementary school

0.077

0.222

0.046

5

EET (employed-educated-trained)

1.063

0.279

0.799

6

Housewife/Husband

2.117

0.276

1.573

7

Looking for employment

2.829

0.286

2.175

8

Looking for a first job

2.963

0.349

2.783

9

NEET (not-employed-educated-trained)

3.001

0.305

2.646

10

Employed

0.710

0.290

0.555

11

Students

1.233

0.315

1.043

Age

1

Number of members

4.848

0.712

1.811

2

Unmarried

1.334

0.731

0.511

3

Widowers

4.258

0.756

0.168

4

Divorced

0.374

0.781

0.153

5

Average age of members

2.141

0.744

0.835

where lhj is the estimation of the index in one of the b = 1,2,3 dimension (in our dimensions for wealth, education and age) for zone i, while vjt is the value of variable j for zone i, and wh. is the weight.

A graphical analysis5 of the indicators could be helpful in understanding what we have done. In Figure 18.2, we plot three different maps of Rome, in which every dot is the centre of a census zone, and a darker colour (as detailed in the legends) means the index is closer to one. In the wealth dimension in particular, the red dots mark the richest districts and yellow dots indicate the poorest. For the other two dimensions, obtaining similar

A graphical illustration of the dimensions (scale

Figure 18.2 A graphical illustration of the dimensions (scale: 1: 500,000 cm).

confirmation is more complex where there is less variability, and it is also difficult to know which districts have a high value.

Estimation of Parameters

In order to estimate the three fuzzy dimensions for very small areas (we subdivided the area considered before into 90,000 sub-areas), we selected a sample of 10,000 geo-referenced bank customers, for whom we have the value of the three dichotomous dimensions. We use the membership function described above to transform the variables into the interval [0,1]. The consumers extracted were added to the census zone data obtained earlier.

We stress two points in estimating the value. The first law of geography is: ‘Everything is related to everything else, but near things are more related than distant things' (Tobler, 1970). Taking this into account and applying it to the economy, we assume that the closer people live to one another, the more similar they are. Each of the 90,000 cells is delimited with geo- referenced coordinates, and, when they form a rectangular cell, we use their centre as the point from which we perform the estimation.

Borrowing the inverse distance weighting (IDW) interpolation from geostatistics,4 we obtained the estimation. To apply this model, we formed a hypothesis: the maximum distance (d) beyond which no similarities are found between the points is 2 kilometres. The weights (/L) we use are inversely proportional to the distance and scaled so that they add up to one.

This method enables us to obtain a good estimation of the three dimensions. To confirm the estimation, we sample 10,000 other consumers in the bank dataset ten times, and to compare the data, we assign a value of

1 if the estimation is equal to or higher than the third quartile of the distribution of the estimates, and 0 otherwise. In Table 18.5, we summarise the results.

It is also interesting to focus on one area (e.g. the area around the Vatican City) and see the data illustrated in a map (see Figure 18.3).

These maps contain 5,400 of the 90,000 cells that we have created and for which we have estimated the value. We can see from them how the value changes from one part of the city to another even in a small zone.

Although the individual estimations are not far from the real value, we are also interested in obtaining a good degree of accuracy in the composition of the three dimensions, in such a way as to reproduce the Eurisko model. If we try to do that with the same approach as described above, we obtain the results reported in Table 18.6.

Clearly, that is not very good. Anyway, it was impossible to expect a different result from the moment that not only the single estimation has to have a good accuracy but also the triad of the three indicators must. To solve this problem, we seek an alternative method for achieving Eurisko segmentation and use c-means clustering (Dunne, 1973; Bezdek, 1981). Also known as &-means fuzzy clustering, it similar to classical &-means clustering except that fuzzy clustering does not give a unique group for each unit but, rather, the degree of its membership in all groups. Using a method that is not different from Eurisko’s, we use eight clusters; however, some methods enable us to determine the number of optimal clusters (Xie and Beni index, Fuzzy silhouette index, etc.).

This procedure yields the following Centers for the clusters (see Table 18.7), where N is the number of cells with a larger membership function for that cluster.

It is obvious that it is impossible to link these centres to the Eurisko ones, but if each of the 90,000 cells has not a single person but, rather, a set of

Table 18.5 Accuracy of the estimations

IW

IE

IA

1

0.6755

0.5632

0.5286

2

0.6823

0.5523

0.5387

3

0.6774

0.5488

0.5337

4

0.6748

0.5558

0.5323

5

0.6784

0.5610

0.5305

6

0.6862

0.5572

0.5334

7

0.6752

0.5546

0.5347

8

0.6826

0.5571

0.5328

9

0.6764

0.5607

0.5327

10

0.6819

0.5589

0.5342

Mean

0.6791

0.5570

0.5332

Map of Rome with cells

Figure 18.3 Map of Rome with cells.

them, it is difficult for all of them to be part of a single segmentation group. In this sense, c-means is more flexible and, in our opinion, more appropriate when we are talking about a set of subjects.

Figure 18.4 shows an example of how the results can be interpreted. In particular, we sample 3 of the 90,000 cells and obtain the results in Table 18.8.

Table 18.6 Matches of the tern

Tern

1

0.2141

2

0.2139

3

0.2081

4

0.2096

5

0.2094

6

0.2220

7

0.2150

8

0.2188

9

0.2167

10

0.2167

Mean

0.2144

Table 18.7 Centres of clusters

Cluster

IW

IE

IA

N

1

0.302

0.494

0.249

11,185

2

0.269

0.471

0.276

12,908

3

0.235

0.479

0.237

14,998

4

0.530

0.565

0.174

3,388

5

0.202

0.425

0.265

11,731

6

0.336

0.496

0.190

7,595

7

0.173

0.380

0.334

13,788

8

0.227

0.437

0.308

14,407

Membership function

Figure 18.4 Membership function.

In Figure 18.4, the higher the value of membership, the larger and darker the dot is. It is easy to conclude that cell 3 is highly linked to cluster 4, which (looking at the centre of the cluster) can be summarised as composed of young people with a high education and wealth. Cell 2 seems to be mainly part of cluster 6, in which people are still young and have a high education

Table 18.8 Estimated value of variables

Cell

Long.

Lat.

IW

IE

IA

1

33505

12.53995

41.86196

0.207

0.418

0.304

2

15210

12.5435

41.82267

0.391

0.538

0.204

3

11022

12.55295

41.81325

0.507

0.560

0.246

Note: The coordinates refer to the centre of the cell.

but less rich than the previous one. Lastly, cell 1 does not have a value of membership greater than the others but, rather, has values very close to those of clusters 5 and 8, so we conclude that both these groups have low wealth. Now, it is clear that if you want to promote a new type of bank card for young and rich people, you should first go to cell 3, where the right customers for our product will surely be found.

Conclusion

In this chapter, we try to identify an alternative method, not to substitute for the Eurisko segmentation of customers, but to cluster new potential consumers who are in a specific cell using fuzzy clustering, which enables us to improve on it.

First, defining fuzzy indicators means we do not have to find a point where the population is dichotomised. This might be a minor detail, but it is easy to understand the importance if we think about the variations, especially of wealth, in a country. If a company wishes to open a new branch in a city where it does not yet have any customers, it could be very difficult to define the point from which it should dichotomise hypothetical new customers, so it could be an error to use the same point as is used in another city that is far away.

Furthermore, IDW interpolation enables us to follow the law that people with similar characteristics live nearby (e.g. neighbourhoods), and we obtained good results with low estimated error for the 90,000 small areas into which Rome was subdivided. A graphical analysis of the map confirmed and demonstrated the accuracy of our estimates.

Finally, using c-means clustering, for each cell we assign a membership degree to a cluster, which can help us to understand the characteristics of potential customers within this zone and therefore guide decisions in our business activities.

Notes

1 Eurisko is an Italian research institute founded by G. Calvi in 1972, which began

to develop Eurisko segmentation in 1976.

  • 2 To clarify: if a specific area has 20 households, whose total income is 1 million euros and did not have enough average household values for the zone, which we consider half a million euros per household.
  • 3 For more on this issue, see Tyner (2010), Iliffe (2008) and Lovelace, Nowosad and Miinchow (2019).
  • 4 For an overview of this method, see Babak and Deutsch (2009).

References

Azri, S.A., Ujang, U., and Rahman, A.A. (2020), ‘Voronoi Classified and Clustered Data Constellation: A New 3D Data Structure for Geomarketing Strategies’. ISPRS Journal of Photogrammetry and Remote Sensing 162: 1-16.

Azri, S.A., Ujang, U., Rahman, A., Anton, F., and Mioc, D. (2016), ‘3D Geomarketing Segmentation: A Higher Spatial Dimension Planning Perspective’. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4/W: 1-7. doi:10.5194/ isprs-archives-XLII-4-W1-1-2016.

Babak, O., and Deutsch, C.V. (2009), ‘Statistical Approach to Inverse Distance Interpolation’. Stochastic Environmental Research and Risk Assessment 23, no. 5: 543-53.

Baltyzhakova, T.I., and Bryzhataya, E.S. (2019), ‘Analysis of Urban Territory in Terms of Accessibility to Social Objects’. Journal of Physics: Conference Series 1333: 032005. https://doi.Org/10.1088/1742-6596/1333/3/032005.

Betti, G. (2017). ‘What Impact Has the Economic Crisis Had on Quality of Life in Europe? A Multidimensional and Fuzzy Approach’. Quality & Quantity 51(1): 351-364.

Betti, G., Gagliardi, F., Lemmi, A., and Verma, V. (2015), ‘Comparative Measures of Multidimensional Deprivation in the European Union’. Empirical Economics 49, no. 3: 1071-1100.

Betti, G., and Lemmi, A. eds. (2013), Poverty and Social Exclusion: New Methods of Analysis. London: Routledge.

Betti, G., Soldi, R., and Talev, I. (2016), ‘Fuzzy Multidimensional Indicators of Quality of Life: The Empirical Case of Macedonia’. Social Indicators Research 127, no. 1: 39-53.

Betti, G., and Verma, V. (2008), ‘Fuzzy Measures of the Incidence of Relative Poverty and Deprivation: A Multi-Dimensional Perspective’. Statistical Methods and Applications 17, no. 2: 225-250.

Bettio, F., Ticci, E., and Betti, G. (2020), ‘A Fuzzy Index and Severity Scale to Measure Violence against Women’. Social Indicators Research, 148, no. 1: 225-249.

Bezdek, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms. Boston: Springer.

Cliquet, G. (2006), Geomarketing: Methods and Strategies in Spatial Marketing. Geographical Information Systems Series. Newport Beach, CA: ISTE USA.

Dunn, J.C. (1973), ‘A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters’. Journal of Cybernetics 3, no. 3: 32-57.

Grekousis, G., and Hatzichristos, T. (2013), ‘Fuzzy Clustering Analysis in Geomarketing Research’. Environment and Planning B: Planning and Design 40, no. 1: 95-116.

Gustafson, D., and Kessel, W. (1978), ‘Fuzzy Clustering with a Fuzzy Covariance Matrix’. In 1978 IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes, 761-766. San Diego: IEEE.

Horn, J.L. (1965), ‘A Rationale and Test for the Number of Factors in Factor Analysis’. Psychometrika 30, no. 2: 179-185.

Hsu, T.H., Chu, K.M., and Chan, H.C. (2000), ‘The Fuzzy Clustering on Market Segment’. In Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No. 00CH37063), 2, 621-626. San Antonio, TX: IEEE.

Iliffe, J. (2008), Datums and Map Projections for Remote Sensing, GIS, and Surveying. Boca Raton: CRC Press.

Jain, V., and Krishnapuram, R. (2001), ‘Applications of Fuzzy Sets in Personalization for E-Commerce’. In Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569), 1,263-268. Vancouver, BC: IEEE.

Jones, B.D.G., and Shaw, E.H. (2002), ‘A History of Marketing Thought’. In Handbook of Marketing, edited by Weitz, B.A., and Wensley, R., 39-64. London: SAGE.

Lee, N., Broderick, A.J., and Chamberlain, L. (2007), ‘What Is “Neuromarketing”? A Discussion and Agenda for Future Research’. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology 63, no. 2: 199-204.

Lovelace, R., Nowosad, J., and Miinchow, J. (2019), Geocomputation with R.

Tobler, W.R. (1970), ‘A Computer Movie Simulating Urban Growth in the Detroit Region’. Economic Geography 46: 234.

Tyner, J. A. (2010), Principles of Map Design. New York: Guilford Press.

  • [1] the comparative fit index (CFI) analyses the model fit by comparing thediscrepancy between the data and the hypothesised model, which has arange between 0 and 1, with larger value indicating a better fit;
 
Source
< Prev   CONTENTS   Source   Next >