# Comparing citation patterns for two time intervals

We consider an alternative way of comparing citation frequencies for two different time intervals provided they have the same length. The two time intervals used here are 1976-1995 and 1987-2006.[1] Again, attention was restricted to patents receiving at least 20 citations. There were 6774 patents for the first interval and 12,291 patents for the second. We use a general linear model (with possible random effects). The variables used for the comparison are the numerical citation frequency, / (years from the grant year, t), and a binary variable, i, for the two time intervals. Frequency is the response (dependent) variable with the other two used as predictors. A unit of observation is one year of citations for one patent. Operationally, each patent was split into 20 units, one for each year of the interval. Before stating and estimating a model, it is useful to examine the mean response profiles to discern trends of the curves.

Figure 5.26 Plots of citation levels and cumulative generality levels.

The start year is 1987. Left panel: citation distributions for the four clusters and overall (black solid line). Right panel: empirical cumulative distribution functions (ecdf) for generality for only statistically significant clusters and overall (black solid line).

These are shown in Figure 5.27. Examining these profiles suggests that there are two break points in the number of citations: one occurs about four year after the patents were granted, with the second occurring 15 years after the granting year. Accordingly, the following model for the mean response profiles was specified where (i* is the yearly frequency mean:

Figure 5.27 Observed mean response profiles. Solid line - 1976-1995, dashed line -1987-2006 .

Table 5.6 Estimated model parameters for comparing two periods.

Left: no interaction terms, Right: the interaction terms are time X time interval.

Two models were estimated, one without interaction terms and one with them. The results are shown in Table 5.6. The model without interaction terms is statistically significant with all /? coefficients also significant. The variance explained is low (about 7%). For our purposes here - comparing the patterns in two non-overlapping time intervals - these two variables suffice. The model suggests that: 1) there is a steep rise of citations for the first 4 years (/?, is positive) to a peak; 2) there is a (slightly declining) plateau due to small negative estimates for fi2 f°r 4-15 years since the patents were granted, and 3) the number of citations start to decline more steeply due to the larger negative estimate for /?3.

Considering also interactions between the variables time and time interval (1976 vs. 1987) and comparing it to a simpler (no interaction) model, the new more complex model

Figure 5.28 Observed mean response profiles and predictions from the regression model with interactions.

The solid line is for 1976-1995 and the dashed line is for 1987-2006. The predictions from the regression model are shown with thick lines.

has a significantly better fit (ANOVA, p-value < O.OOl), although the increase in explained variance is minimal. The interactions between time (years from grant year) and time interval are all statistically significant. It suggests that the number of citations rises more steeply and declines more rapidly for the later time period (1987-2006). The conclusions can be seen more clearly with the predicted mean response from the model presented in Figure 5.28.

• [1] This provides some continuity to the results shown in the previous two sections by using start points of 1976 and 1987.