A spatially integrated data computational framework

Defining activity’ trajectories of social media users

The starting point for our analysis is to extract and build the space-time trajectories of social media users for identifying individuals’ footprints in a geographic space. Assume that there is a country space which contains M cities available for individuals’ mobility. A set of N individuals would post their daily social activities (e.g. traveling) through a location-based social media platform. The measurement of “space-time trajectory” for capturing hitman mobility is followed by Hager-straand’s (1970) implicit function which has been widely used in the geographical analysis (Zheng and Zhou, 2011; Gao and Liu, 2013; Cao et al., 2015).

We define that a social media user, w,(r e [1,7V]), has a space-time trajectory W, within a country. Tins real-life trajectory is approximately identified by ITT]; Where ITT] represents a set of geographically tagged footprints of location ($,), timestamp (f,) and message content (c,) posted in social media.

For each userm„ the space-time trajectory can be written as:

ITT] = {(s¡, t; ,c¿), (s/+1, t/+1, c/+1).... (s/+*, tfk, cf*)...} (1)

Where j > 0; k > 0; tJ+k > tJ+k~l... > tJ One fundamental issue to build human trajectory is to identify the user’s origin city. Existing studies have often used the most frequently visited city as the origin city and used a spatial radius to track individuals’ footprints (Gonzalez et al., 2008; Cao et al., 2015). However, this kind of identification method may contain error because of disparities in individuals’ initial motivations. For a more rigorous assessment, we apply the text mining methods (Rao et al.. 2010; Burger et al., 2011; Wang et al., 2013) to analyse the historical tweets to derive and validate social media users’ cunent residence information. To be specific, we define social media user’s current residence as his or her origin city and other cities in space-time trajectories as visited destination cities.

Dimensional mobility algorithm

Adopted from Leonardi et al. (2014), a graphical framework for data warehousing and data cuboid is employed in this study to represent the dimensional mobility algorithm of social media users across cities. In the social media data cuboid, we stratify three dimensions: first, user dimension. To avoid data privacy concerns, we restrict ow focus to extracting users’ origin city information and geo-tagged footprint information from its registration location, historical social media sending place and text-based contents. Second, spatial dimension. It identifies the number of users in each city as the basic spatial cuboid scenario. We define the cuboid (C)-based geometric measures as follows:

C(u,f the number of social media users in the origin location O;

OiitC^Uf): the number of mobility visits made by the user i from the origin location O to other destinations.

InCtuf. the number of mobility visits made by the user i from other locations into the origin location O.

The third dimension is related to the temporal information. We break down the temporal measurement intervals into days as our baseline temporal cuboid. Evidently, by interacting the temporal dimension with spatial-user dimensions, we can quantify human mobility flow patterns between city pairs over time and space.

Aggregation function

Incorporating individuals’ space-time trajectories into the dimensional mobility geometries measures requires appropriate aggregation functions for efficient data query operations (Gray et al., 1997).

Assume that U is an aggiegated spatiotemporal hierarchy corresponding to a set of human mobility patterns of social media users u, between a pair of cities: and p2 Specifically, px and p2 represent the higher levels of the hierarchy cuboids (e.g. month&city-based cuboid), aggregated by a series of basic individual measures: A = , where (/>],' = 1,2, . . . k); and P2 = , where

(p2Jj = 1,2, . . . k). Thus we can write the aggregation function for measuring mobility flows betweenp{ andp2 as follows:

= (2)

In addition to the mobility flows between city pairs, it would also be interesting to know the total out-flow volume and in-flow volume of social media users for each city. Recall that we assume that there are M cities (/>„,) available for individuals’ mobility in China. By deducting the space-time trajectories that occurred

Mining urban social interaction patterns 17 within the city boundaries of p„„ we can finalize the aggregation function for measuring total out-flow volume (OutC(p,„)) and in-flow volume (InC(p,„)) of social media users ofp,„ as:

outc(pm) = 0)

Mapping network topology

Hypothetical urban social interaction footprints are defined as individuals’ intercity mobility behaviors based on aggregated social media data. The logic behind this is that, an intercity connection ‘geo-tagged’ linkage will be created if an individual’s Twitter account is registered in city A (corresponding to individuals’ current residence city) but sends a tweet from city B (corresponding to individuals’ geo-tagged cities). This assumption is critical because it has allowed for the transformation of social media users’ geo-tagged records into the topology of intercity social networks, as detailed below.

We adopt the directed star network topology approach to construct the matrix between a social media user’s origin city and destination cities. In line with the common practice, we calculate the origin-destination matrix using a two-step procedure.

In the first step, we use the web crawler methods to retrieve all users’ geotagged records throughout China. This involves massive computing exercises due to the large amount of LBSM data. A typical example for illustrating a user’s trajectories from geo-tagged record is {< origin city = City O>, 1, tA>, 2> t2>, 3, t}>, 2, t4>, 4, t5> . . .}. This simple example illustrates the trajectory of a user from the origin city (City-O) to four geo-tagged destination cities (City-Z>b City-Z)b City-Z)b City-Z),) in five times fr om t{ to t5.

The second step reads and calculates the intensity of linkages between city pan s within the directed star topology network. First, the origin cities are defined as the core nodes m the network, whereas other geo-tagged destination cities are defined as leaf nodes that include connections to the origin city. Second, we characterize the direction of city-pah linkages from the core code to leaf codes. This means that directed intercity travel flows are drawn from assigning an outward direction from the core code to each leaf code in the topology network. Third, w'e use the frequencies of geo-tagged destination cities to weight the linkages of intercity connections in the topology network. We also calculate the accumulative travel flow's out of this city and towards this city. By doing so, w'e can adjust the origin-destination human mobility flow's along the network to reflect the intensity of intercity linkages.

< Prev   CONTENTS   Source   Next >