# Data availability

There are two cases of data availability: no data are available and some data are available.

*Case I: no data available*

New-to-the-world products are the only type of new product in this category. Since they are completely new, historical sales data, whether internal or external (i.e., competitive) are unavailable and there are no *analog products.* An analog product is one that already exists and has characteristics, functions, form, and features similar to the new product, but yet differs from the new product in at least one significant way. The analog would have a sales history that could be used, albeit with caveats because it is only an analog. The lack of an analog, however, makes forecasting the *NTW* product difficult since there is literally nothing to work with. Another way to forecast, perhaps using judgement only, is needed.

*Case II: some data available*

A lack of data of any form and quantity for *NNTW* forecasting, is unrealistic. Some data should be available for these products, although they may not be what most analysts consider useful or sufficient. For example, if a market test was conducted to determine selling potential and identifying product issues (e.g., usability) then sales data collected during the test could be used for forecasting when the product is ready for launch. If a clinic was conducted, then sales data would not be available since sales were not made, but a discrete choice experiment could be part of the clinic and the estimated take rates from the experiment could be used as I described above. In fact, a clinic is not necessary since discrete choice experiments could be conducted outside, or independent, of clinics. Several choice studies could be conducted and results averaged to yield a more robust set of take rates.

Data may be available for competitive analog products. These data may be difficult to obtain, but most large companies have competitive assessment and tracking groups that can develop estimates of sales and market share as well as collect price data. *Web crawlers,* also known as *spiders* and *spiderbots,* crawl the world wide web looking for information. Crawlers could be used to gather data and message data but crawling for sales data may be more difficult if not impossible because sales data are generally not publicly available. See Lemahieu et al. [2018] for some discussion about web crawlers. Also see the Wikipedia article at https://en.wikipedia.org/ wiki/Web_crawler on web crawlers.

Data for internal products in the same or similar line are definitely available. Data for internal analog products can be combined in some fashion. Baardman et al. [2017] note that one method used by practitioners involves convening a team of experts, perhaps SMEs and *KOLs,* to identify which products should be combined if the product line is sufficiently large. A simple averaging of the data for the selected products yields the needed series. They refer to this as a “cluster-then-estimate” approach: subjectively form a cluster of products that seem appropriate for the new product and then use the average sales data for forecast model estimation for the new product. This is a simplistic approach but an easily explained and intuitive one.

Baardman et al. [2017] proposed a method that involves a different form of clustering of analog products that involves simultaneously building a cluster and fitting a forecasting model to the cluster. They call their method “cluster-while- estimate.” The models are standard regression models. These models, and other modeling approaches that can be used in new product forecasting, will be discussed in Section 6.5. Baardman et al. [2017] note an issue with their approach: its complexity. There are potentially a large number of parameters that have to be estimated since one model is needed to cluster the products and another is needed to produce a forecast. They note that this problem is “NP-hard and practically intractable.”^{2 }However, they developed an algorithm that makes the problem more tractable and thus practical to use. See Baardman et al. [2017] for details.

# Training and testing data sets

A best practice to follow, when possible, is to split historical time series data into two parts for model estimation: training and testing data sets as I mentioned in Chapter 2. The training data set contains historical data used to “train” a forecasting model. Training means estimation. Since a time series is the basis for a forecast, typically either the first ^{2}/з or ^{3}/t is used for training. Other terminology' for this data set includes “with-in sample”, “initialization period”, and “calibration period” data set.

The testing data set is the remaining historical data reserved to “test” the results of the forecasting model using forecast accuracy measures such as the ones I describe below. I also discuss forecast accuracy testing in Chapter 7, but the difference between the testing I describe there and here, even though the same statistics are used, is the focus. In Chapter 7, the focus is actual market performance relative to the forecasted performance. Here, the focus is on how well a forecast model predicts the future before that future is known. The only way to judge this is with a surrogate for the future. This is the testing data set. Other terminology for the testing data set includes “test period”, “validation period”, and “holdout-sample period” data set.

Notice that the division into two data sets is not based on random sampling. It can not be; random sampling will result in nonsequential data in both data sets. In other words, random sampling will produce gaps in a time series because some of the time periods will be (randomly) assigned to one of the two data sets. We prefer “gapless” time series. The method for splitting to preserve the time sequence involves picking a period and declaring all data before it to be training data and all other data to be testing data.

Also common is dividing the data such that only a few observations at the end of the series are held back for testing. This depends on how far out you want to forecast. If you want to forecast only 1 or 2 steps ahead, then only hold back 1 or 2 of the last observations. If your data set is small, then it is not practical to hold out too many - fit might be jeopardized if too many are withheld from the training data set. Hold out only a few of the precious observations. Some people use all the data for training a model. Then what is used for testing accuracy? The only possibility is future actuals which may take too long to come in, but more importantly there is no way to develop confidence in the forecast.

Figure 6.4 shows how a time series is divided.

I discuss forecast accuracy below and again in Chapter 7.

FIGURE 6.4 Period *t _{0}* is the starting time for the data set and

*T*is the ending time. The original data set spans this entire time interval.

*T'*is an arbitrarily set period for dividing the original data into two parts. The first part from

*t*to

_{0}*T'*is the training data set and the remainder from

*T'*+ 1 to

*T*is the testing data set.

# Forecasting methods based on data availability

Without historical or analog data, non-modeling methods have to be used to develop a forecast. Judgement can certainly be used, and often is. This involves management contributing their opinions as well as experts, S.V/Es and *KOLs,* providing their input sometimes in a panel setting. The Delphi method is an alternative way to get their input. This involves using a questionnaire rather than talking to the experts in person. As noted by Levenbach and Cleary [2006], there are criticisms of this approach that include the true level of expertise of the panel, the clarity of the questionnaire, and the reliability of the forecasts based on the survey results. Also see Hyndman and Athanasopoulos [2018] for other discussions ofjudgement-based forecasts.

There are more opportunities for forecast modeling when there is some data. The possibilities depend on the amount. I will divide the possibilities into two classes: naive and sophisticated.

## Naive methods

There are two naive methods. One is actually called a *naive forecasting model* and the other is a *constant mean model.* Both are naive because they rely on simplistic assumptions. Yet both have proven useful in applications.

**Naive forecasting model**

A naive forecasting model uses the current period’s actual value for the 1 -step ahead forecast

This is sometimes called the *Naive Forecast 1,* or *NF1.* See Levenbach and Cleary [2006]. A naive h-step ahead forecast is a repetition of the one-step ahead forecast since nothing else is known beyond period T so *Y _{T}(li) = Y_{r}yii >* 1. For this model,

*V[ Y*This is a

_{T}(h)] = her^{2}.*random walk*model. See the Appendix for details.

A problem with a naive forecast is that each time a new actual value becomes available, a new forecast must be generated. This is not so onerous because modern software can easily make the adjustments.

Another problem occurs when there is a change from the penultimate to last observation in a series. If there was a drop that brought you to the last actual, then it is reasonable to believe the drop will repeat so the first future value should be lower than the last actual value. This change in the actuals is not considered in the *NF1,* just the last value is used, so the forecast will be wrong from the start. The change in the actuals is handled by modifying the naive procedure to be

where *p* is the proportion of the change from period *T—* 1 to T you wish to include. This is *Naive Forecast 2* (*NF2*). There is also a *Naive Forecast 3,* or *NF3,* that accounts for seasonality. Seasonality may be a problem for a new product forecast but it depends on the nature of the product and the length of the available time series. See Levenbach and Cleary [2006] for some discussion.

New actuals will become available for a new product once the product is launched so *NF1* or *NF2* can only be used for a short period of time. The length of the time depends on the sales generating process. Once a sufficiently long set of actuals is developed, a different, more sophisticated forecasting method should be used and *NF1* and NF2 dropped.

**Constant mean method**

Another naive procedure, if more historical data are available, is to average the most recent values and use the average for the forecast. This is a *constant mean model.* See Gilchrist [1976J. This model assumes that

where *e, ~ jC(0,a ^{2})* and

*j*is

*white noise.*The actual value in

period *T + It* is

The h-step ahead forecast based on data up to time *T* is *Y _{l} (h) = ц* + e

_{T+A}, but since you do not know the future disturbance term when the forecast is made, you have to use its expected value which is zero. Therefore, the h-step ahead forecast is merely

*Y*//. Since you also do not know the mean /<, you substitute the unbiased estimate of /( which is the sample mean,

_{T}(h) =*Y =*'/'/ 2/=i У/- An estimate of the h-step ahead forecast is then

*Y*= i

_{T}(li)*Y*

_{t}.## Sophisticated forecasting methods

The *NF1* and constant mean models are usefi.il when small amounts of data are available, which is the typical case. If only one data point is available, then the *NF1 *and constant mean models are identical; they diverge otherwise. In situations where more data are available, perhaps from analogs, then more advanced methods can be used. These include *A RIM A* modeling which is actually a family of methods, trend analysis, econometric analysis with key driver variables (e.g., real GDP), and smoothing techniques such as exponential smoothing. Exponential smoothing is a member of the *AR1MA* family. The *AR1MA* family is summarized in the Appendix to this chapter. Modeling possibilities based on data availability are summarized in Figure 6.5.

**Smoothing methods**

The basic structure for a constant mean was developed above. This model, however, is only good for short periods. It is unlikely that the mean will be constant from one *locality in time* to another. A locality is a period of time, say the first 5 months,

FIGURE 6.5 Tliis decision tree will help you select a forecasting methodology based on data availability.

the second 5 months, etc. This is also sometimes referred to as a *window.* The mean, /(, is often viewed as a slowly varying quantity. I will consider situations in which the mean changes later.

In the constant mean model, averaging over all the data has the effect of reducing random variation, leaving an estimate of *ц.* If an estimate of *ц* is required in only one locality of the data, then you can average the data for that locality' and ignore the rest. But this is unlikely. You will typically be interested in many localities. A *moving average* is the most popular technique for handling many localities because of its simplicity. It is based on averaging a sliding window of values over time resulting in a *smoothing* of historical data in which the effects of seasonality and randomness are eliminated or reduced. This method is good for short-term forecasts of one or two steps ahead. As long as no trending is expected in the immediate future and no seasonality is present, this is an effective, readily understood, and practical method.

Like any average, moving averages are based on weighted data, but the weights for a simple average are all constant at '/«, where *n* is the number of values being averaged in a window Clearly, these weights are positive and sum to 1.0. You could use any window size. The larger the *n,* the greater the smoothing. The n-term moving average is

A 1-step ahead forecast made at *I = T* based on a simple moving average is found by setting the forecast equal to the value of the moving average at *t=T *

It is the last smoothed value based on the last *n* actuals - you can not do any more calculations because there is no more data. The h-step ahead forecast, *Y _{T}(h), *based on a simple moving average is a repetition of the one-step ahead forecast:

*Y*Y

_{T}(h) =_{T}(1) for

*h>*1.

For forecasting short-term demand based on weekly or daily data, the most recent historical period is usually the most informative. So you want to weight the most recent values in a window more heavily. The weighted n-term moving average is

The *Wj* weights are positive and must sum to l.O.^{3} The 1-step ahead forecast, Y_{T}(1), based on a weighted moving average, is *Y _{t n}* as above. The h-step ahead forecast,

*Y*is the h-step ahead forecast given by a repetition of the one-step ahead forecast:

_{T}(h),

for *li>* 1.

The one-step ahead forecast based on the simple moving average can be written as

Suppose you only have the most recent observed value, *Y _{T},* and the one-step ahead forecast for that same period, Y

_{7}-_

_{1}(1), made in the previous period, T—1. The value Y

_{T}_„ is unavailable because it is outside the window, but you could use an approximate value, the most likely being the one-step ahead forecast from the preceding period, Y

_{;}_](l). The updated formula is now

Let or = '/i. Then you have a general form of an equation for forecasting by the method of *exponential smoothing* or *exponential averaging*

You only need the most recent observation, *Y _{T},* the most recent forecast for T, Y

_{r}_,(l), and a value for or, the weight placed on “today.”

You need 0 < *a* < 1. It can be specified or estimated. The *a* is typically specified by the user rather than estimated from data. Experience has shown that good values for or are between 0.05 and 0.3. As a general rule, smaller smoothing weights are appropriate for series with a slowly changing trend, while larger weights are appropriate for volatile series with a rapidly changing trend. You can “estimate” *a* by repeatedly trying different values for *a* (typically 0.1,0.2,..., 0.9), checking some error statistic such as *mean square error* (*MSE*), and then choosing that value of or that gives the best value for the statistics (e.g., minimum *MSE).* This is a *grid search.*

This discussion of moving averages and exponential smoothing follows Wheelwright and Makridakis [1980]. Also see Levenbach and Cleary [2006].

**Linear trend method**

Consider the model

Without the /?, *t* term, you have the constant mean model. For the model with the /?] *t* term, you have a simple linear regression model with a time trend variable capturing an underlying trend in the data. The variable is *t =* 1,2,..., *T.* This variable is interval scaled, meaning you can change variable definition by adding a constant and results invariant to change. For example, you can add 1900 to *t* and estimation is unaffected.

The 1-step ahead forecast, Y_{r}(l), is obtained by substituting the time *t=T +* 1 into the fitted model

The forecast error for Y_{T}(1) is

Observe that *E(e*_{T+l}) = 0 since /;(/?,>) = Д, and /:'(/?,) = /?,. Therefore, E[Y_{r}(l)] = Y_{T+1}, so the forecast is unbiased. The *MSE* is

**Econometric methods**

The linear trend method is a special case in the econometric family of models. The larger, more general family includes explanatory variables such as prices, competitive factors, real GDP, and so forth. Forecasting is more complicated because separate forecasts of these other explanatory variables are needed. The linear trend is simpler for forecasting because you just have to extend the time variable to however far into the future you want to go. You cannot simply do this with a larger econometric model.

*ARIMA methods*

*ARIMA* is an acronym for *AutoRegressive Integrated Moving Average.* This is a family of models that has as special cases some of the models I discussed above. In fact, an econometric model is also a special case of an even wider *ARIMA* model called a *transfer function model.* The basics of an *ARIMA* are discussed in the Appendix to this chapter.

## Data requirements

The actual method used depends on data availability. There are no hard and fast rules for how much data you need for any method, just rules-of-thumb.

To quote Hyndman and Athanasopoulos [2018 ]:

*We often get asked how few data points can be used to fit a time series model. As with almost all sample size questions, there is no easy answer. It depends on the number of model parameters to be estimated and the amount of randomness in the data. The sample size required increases with the number of parameters to be estimated, and the amount of noise in the data.*

*Some textbooks provide rules-of-thumb giving minimum sample sizes for various time series models. These are misleading and unsubstantiated in theory or practice. Further, they ignore the underlying variability of the data and often overlook the number of parameters to be estimated as well. There is, for example, no justification whatever for the magic number of 30 often given as a minimum for ARIMA modelling. The only theoretical limit is that we need more observations than there are parameters in our forecasting model. However, in practice, we usually need substantially more observations than that.*