
Time Series and Forecasting
August 01, 2024
Time series are especially important in business applications since they are involved in forecasting (revenues, expenses, stock prices, inventory levels, etc.).
A time series is a sequence of measurements (of a variable) taken every hour, day, week, month, quarter, year, or any other regular time interval.
The pattern of the data is crucial for understanding how the time series has behaved over time.
If such behavior can be expected to continue in the future, we can use it to select an appropriate forecasting method.
Forecasting methods can be classified as qualitative or quantitative.
Qualitative methods generally involve expert judgment to develop forecasts.
We will focus on quantitative forecasting methods.
Quantitative forecasting methods can be used when:
In such cases, a forecast can be developed using a time series or a causal method.
Quantitative methods are based on an analysis of historical data concerning one or more time series.
A time series is a set of observations measured at successive points in time.
Time Series Method: Restricts data to past values of the series.
Causal Method: Uses other time series believed to be related to the time series to be forecasted.
Time Series Analysis
Causal Methods
Regression Analysis: Time is treated as the independent variable, and the time series as the dependent variable.
Time-Series Regression: The sole independent variable is time.
Cross-Sectional Regression: The independent variables are something other than time.
The first step in selecting an appropriate forecasting method is to construct a time series plot to identify patterns.
A time series plot is a graphical presentation of the relationship between time and the time series variable.
It is a scatterplot where time is on the horizontal axis, and the time series values are on the vertical axis.
Gasoline sale time series
The number of gallons of gasoline sold by a gasoline distributor over a period of 12 weeks is given in the table below. The distributor would like to identify the underlying pattern in the data to guide it in selecting an appropriate forecasting method.
| Week | Sales |
|---|---|
| 1 | 17 |
| 2 | 21 |
| 3 | 19 |
| 4 | 23 |
| 5 | 18 |
| 6 | 16 |
| 7 | 20 |
| 8 | 18 |
| 9 | 22 |
| 10 | 20 |
| 11 | 15 |
| 12 | 22 |
Gasoline sales time series
The common types of data patterns that can be identified when examining a time series plot include:
A horizontal pattern exists when the data fluctuate around a constant mean.
Changes in business conditions can often result in a time series that shifts to a new level.
It is more difficult to choose an appropriate forecasting method to identify a change in the level of the time series. (The Change-time problem.)


Stationary Time Series
The term stationary time series is used to denote a time series whose statistical properties are independent of time. In particular, this means that:
The process generating the data has a constant mean.
The variability of the time series is constant over time.
A time series plot for a stationary time series will always exhibit a horizontal pattern. But simply observing a horizontal pattern is not sufficient evidence to conclude that the time series is stationary.
More advanced texts on forecasting discuss procedures for determining if a time series is stationary and provide methods for transforming a time series that is not stationary into a stationary series.
A time series may show gradual drifts or movements to relatively higher or lower values over a longer period of time.
Trend is usually the result of long-term factors such as changes in the population, demographics, technology, or consumer preferences.
A trend pattern can be identified by analyzing multiperiod movements in historical data.
Seasonal patterns are recognized by seeing the same repeating pattern of highs and lows over successive periods of time (within a “short” period of time).
A seasonal pattern might occur within a day, week, month, quarter, or year.
A seasonal pattern does not necessarily refer to the four seasons of the year.
Some time series include a combination of a trend and seasonal pattern.
Time series decomposition can be used to separate or decompose a time series into trend and seasonal components.
A cyclical pattern exists if the time series plot shows an alternating sequence of points below and above the trend line lasting more than one year.
Often, the cyclical component of a time series is due to multiyear business cycles.
Business cycles are difficult, if not impossible, to forecast.
A time series plot is simply a scatter diagram with lines connecting the points.
Steps
Select cells A2:B13 in the Gasoline.xlsx file.
Click the Insert tab on the Ribbon
In the Charts group, click the Insert Scatter (X, Y) or Bubble Chart button
When the list of scatter diagram subtypes appears:
The time series plot produced by Excel will appear in the same worksheet.

The underlying pattern in the time series is an important factor in selecting a forecasting method.
A time series plot should be one of the first tools developed to determine what forecasting method to use.
If we see a horizontal pattern, then we need to select a method appropriate for this type of pattern.
If we observe a trend in the data, then we need to use a method that can handle the trend effectively.
Measures of forecast accuracy are used to determine how well a particular forecasting method can reproduce the time series data that we already have.
Forecast accuracy measures are important factors in comparing different forecasting methods.
By selecting the method that has the best accuracy, we will obtain better forecasts for future time periods.
\[ \text{Forecast Error} = \text{Actual Observed Value} - \text{Forecast} \]
\[ \text{Residual} = \text{Observed} - \text{Predicted} \]
A positive forecast error indicates the forecasting method underestimated the actual value.
A negative forecast error indicates the forecasting method overestimated the actual value.
A forecast error is basically a residual.
Mean Error (ME): ME is the mean or average of the forecast errors. Because positive and negative forecast errors tend to offset one another, the mean error is likely to be small. Thus, the mean error is not a very useful measure.
Mean Absolute Error (MAE): MAE avoids the problem of positive and negative errors offsetting one another. It is the mean of the absolute values of the forecast errors.
Mean Squared Error (MSE): MSE also avoids the problem of positive and negative errors offsetting one another. It is the average of the squared forecast errors.
Mean Absolute Percentage Error (MAPE): The size of MAE and MSE depend upon the scale of the data, so it is difficult to make comparisons for different time periods. To make such comparisons, we need to work with relative or percentage error measures. Percentage Error is the error divided by the observed value of the time series. The MAPE is the average of the absolute percentage errors of the forecasts.
To demonstrate the computation of these measures of forecast accuracy, let’s introduce the simplest of forecasting methods: the naïve forecasting method
The naïve forecasting method uses only the most recent observation in the time series as the forecast for the next time period.
\[ F_{t+1} = Y_t, \text{ the actual observed value in period } t \]
Gasoline sale time series
| Week | Observed Time series | Naïve Forecast | Forecast error | Absolute value of forecast error | Squared forecast error | Percentage error | Absolute value of percentage error |
|---|---|---|---|---|---|---|---|
| 1 | 17 | ||||||
| 2 | 21 | 17 | 4 | 4 | 16 | 19.05 = \(\frac{4}{21}\) | 19.05 |
| 3 | 19 | 21 | -2 | 2 | 4 | -10.53 | 10.53 |
| 4 | 23 | 19 | 4 | 4 | 16 | 17.39 | 17.39 |
| 5 | 18 | 23 | -5 | 5 | 25 | -27.78 | 27.78 |
| 6 | 16 | 18 | -2 | 2 | 4 | -12.5 | 12.5 |
| 7 | 20 | 16 | 4 | 4 | 16 | 20.00 | 20.00 |
| 8 | 18 | 20 | -2 | 2 | 4 | -11.11 | 11.11 |
| 9 | 22 | 18 | 4 | 4 | 16 | 18.18 | 18.18 |
| 10 | 20 | 22 | -2 | 2 | 4 | -10.00 | 10.00 |
| 11 | 15 | 20 | -5 | 5 | 25 | -33.33 | 33.33 |
| 12 | 22 | 15 | 7 | 7 | 49 | 31.82 | 31.82 |
| Totals | 5 | 41 | 179 | 1.19 | 211.69 |
Week 2: the postive forecast error indicates that the forecasting method underestimated the actual value of sales.
Week 3: the negative forecast error indicates that the forecasting method overestimated the actual value of sales.
Gasoline sale time series: Naive Forecast Accuracy
\[ \text{MAE} = \frac{41}{11} = 3.73 \]
\[ \text{MSE} = \frac{179}{11} = 16.27 \]
The size of MAE and MSE depends upon the scale of the data. So, it is not recommended to make comparisons for different time intervals (e.g. months vs weeks).
To make comparisons like these we need relative or percentage error measues like MAPE.
Mean Absolute Percentage Error (MAPE): It is the average of the absolute percentage errors of the forecasts.
\[ \text{MAPE} = \frac{211.69}{11} = 19.24\% \]
Let’s use the average of all the historical data available as the forecast for the next period.
We begin by developing a forecast for Week 2. Since there is only one historical value available prior to Week 2, the forecast for Week 2 is just the time series value in Week 1. Thus, the forecast for Week 2 is 17 thousand gallons of gasoline.
To compute the forecast for Week 3, we take the average of the sales values in Weeks 1 and 2:
\[ \text{Forecast for week 3} = \frac{17 + 21}{2} = 19 \]
Similarly, the forecast for Week 4 is:
\[ \text{Forecast for week 4} = \frac{17 + 21 + 19}{3} = 19 \]
The forecasts obtained using this method for the gasoline time series are:
| Week | Time Series Value | Forecast | Forecast Error | Absolute Value of Forecast Error | Squared Forecast Error | Percentage Error | Absolute Value of Percentage Error |
|---|---|---|---|---|---|---|---|
| 1 | 17 | ||||||
| 2 | 21 | 17.00 | 4.00 | 4.00 | 16.00 | 19.05 | 19.05 |
| 3 | 19 | 19.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 4 | 23 | 19.00 | 4.00 | 4.00 | 16.00 | 17.39 | 17.39 |
| 5 | 18 | 20.00 | -2.00 | 2.00 | 4.00 | -11.11 | 11.11 |
| 6 | 16 | 19.60 | -3.60 | 3.60 | 12.96 | -22.50 | 22.50 |
| 7 | 20 | 19.00 | 1.00 | 1.00 | 1.00 | 5.00 | 5.00 |
| 8 | 18 | 19.14 | -1.14 | 1.14 | 1.31 | -6.35 | 6.35 |
| 9 | 22 | 19.00 | 3.00 | 3.00 | 9.00 | 13.64 | 13.64 |
| 10 | 20 | 19.33 | 0.67 | 0.67 | 0.44 | 3.33 | 3.33 |
| 11 | 15 | 19.40 | -4.40 | 4.40 | 19.36 | -29.33 | 29.33 |
| 12 | 22 | 19.00 | 3.00 | 3.00 | 9.00 | 13.64 | 13.64 |
| Totals | 4.53 | 26.81 | 89.07 | 2.76 | 141.34 |
Using the results shown, we obtained the following values of MAE, MSE, and MAPE:
\[ \text{MAE} = \frac{26.81}{11} = 2.44 \] \[ \text{MSE} = \frac{89.07}{11} = 8.10 \] \[ \text{MAPE} = \frac{141.34}{11} = 12.85\% \]
We can now compare the accuracy of the two forecasting methods we have considered by comparing the values of MAE, MSE, and MAPE for each method.
| Naive Method | Average of Past Values | |
|---|---|---|
| MAE | 3.73 | 2.44 |
| MSE | 16.27 | 8.10 |
| MAPE | 19.24% | 12.85% |
For every measure, the average of past values provides more accurate forecasts than using the most recent observation as the forecast for the next period.
If the underlying time series is stationary, the average of all the historical data will generally provide the best results.
If the time series is not stationary, adjustments are needed.
In cases where the time series shifts to a new level (e.g., due to contract changes), the naive method might adapt faster than the averaging method.
Measures of forecast accuracy are crucial but should not be the sole basis for choosing a forecasting method.
Consider the business context and the likelihood of changes in the time series level.
Historical forecast accuracy should be weighed alongside the ability of the method to adapt to shifts.
Moving Averages (MAs)
Weighted Moving Averages
Exponentially Weighted Moving Averages (EWMAs)
They are called smoothing methods because their objective is to smooth out the random fluctuations (due to random errors or noises) in the time series.
They are most appropriate for short-range forecasts.
The moving averages method uses the average of the most recent k data values in the time series as the forecast for the next period. Mathematically, a moving average forecast of order k is as follows:
Moving Average Forecast of Order \(k\)
\[ F_{t+1} = \frac{\sum (\text{most recent } k \text{ data values})}{k} = \frac{Y_t + Y_{t-1} + \cdots + Y_{t-k+1}}{k} \]
where:
Each observation in the moving average calculation receives the same weight \((1/k)\).
The term moving is used because every time a new observation becomes available for the time series, it replaces the oldest observation in the equation.
As a result, the average will change, or move, as new observations become available.
To use moving averages to forecast, we must first select the span (the order), which is the number of observed time series values to be included in the moving average.
Example: Gasoline sale time series: three-week moving average (\(k = 3\))
| Week | Time series value | Forecast | Forecast error | Absolute value of forecast error | Squared forecast error | Percentage error | Absolute value of percentage error |
|---|---|---|---|---|---|---|---|
| 1 | 17 | ||||||
| 2 | 21 | ||||||
| 3 | 19 | ||||||
| 4 | 23 | \(\frac{(17 + 21 + 19)}{3} = 19\) | 4 | 4 | 16 | 17.39 | 17.39 |
| 5 | 18 | \(\frac{21 + 19 + 23}{3} = 21\) | -3 | 3 | 9 | -16.67 | 16.67 |
| 6 | 16 | 20 | -4 | 4 | 16 | -25.00 | 25.00 |
| 7 | 20 | 19 | 1 | 1 | 1 | 5.00 | 5.00 |
| 8 | 18 | 18 | 0 | 0 | 0 | 0.00 | 0.00 |
| 9 | 22 | 18 | 4 | 4 | 16 | 18.18 | 18.18 |
| 10 | 20 | 20 | 0 | 0 | 0 | 0.00 | 0.00 |
| 11 | 15 | 20 | -5 | 5 | 25 | -33.33 | 33.33 |
| 12 | 22 | 19 | 3 | 3 | 9 | 13.64 | 13.64 |
| Totals | 0 | 24 | 92 |
Example: Gasoline sale time series: three-week moving average
\[ \text{MAE} = \frac{24}{9} = 2.67 \]
\[ \text{MSE} = \frac{92}{9} = 10.22 \]
\[ \text{MAPE} = \frac{129.21}{9} = 14.36\% \]
Example: Gasoline sales time series: three-week moving average
Enter/Access Data: Open the file Gasoline. The data are in cells A2:B13 and labels are in cells A1:B1.
Step 1: Click the Data tab on the Ribbon
Step 2: In the Analyze group, click Data Analysis
Step 3: Choose Moving Average from the list of Analysis Tools and click OK
Step 4: When the Moving Average dialog box appears:
B2:B13 in the Input Range box3 in the Interval boxC3 in the Output Range boxThe three-week moving average forecasts appear in column C of the worksheet; a chart showing the actual values of the time series and the forecasted values also appears. Forecasts for periods of other lengths can be computed easily by entering a different value in the interval box.
In the moving averages method, each observation in the moving average calculation receives the same weight.
Weighted Moving Averages: we can select a different weight for each data value (e.g. the most recent observations) and then compute a weighted average of the most recent \(k\) values as the forecast.
The more recent observations are typically given more weight than older observations.
For convenience, the weights should sum to 1.
For the Gasoline data, a three-period weighted moving average (3WMA) for Week 4 is:
\[ WMA_3 = \frac{1}{6}(17) + \frac{2}{6}(21) + \frac{3}{6}(19) = 19.33 \]
if the weights are \(\frac{1}{6}, \frac{2}{6}, \frac{3}{6}\), which sum to 1.
19 is the most recent of the three observations and receives the largest weight.
To determine whether one particular combination of number of data values and weights provides a more accurate forecast than another combination, it is recommended to use MSE as the measure of the forecast accuracy.
The exponential smoothing forecast is a special case of the weighted moving averages.
It is the weighted average of all the time series data up to the current period.
We select only the weight between 0 and 1 for the most recent observation. This weight is called the smoothing constant and is denoted by \(\alpha\).
Single Exponential Smoothing Forecast
\[ F_{t+1} = \alpha Y_t + (1 - \alpha) F_t \]
where
There are a number of exponential smoothing procedures.
The term exponential smoothing comes from the exponential nature of the weighting scheme for the historical values.
The statistic computed in each period is called the Exponentially Weighted Moving Average (EWMA).
Let’s illustrate by working with a time series involving only three periods of data: \(Y_1\), \(Y_2\), and \(Y_3\).
To initiate, let \(F_1\) equal the actual value of the time series in period 1; that is, \(F_1 = Y_1\). Hence, the forecast for period 2 is
\[ \begin{aligned} F_2 &= \alpha Y_1 + (1 - \alpha) F_1 \\ &= \alpha Y_1 + (1 - \alpha) Y_1 \\ &= Y_1 \end{aligned} \]
Thus, the exponential smoothing forecast for period 2 is equal to the actual value of the time series in period 1.
The forecast for period 3 is
\[ F_3 = \alpha Y_2 + (1 - \alpha) F_2 = \alpha Y_2 + (1 - \alpha) Y_1 \]
Finally, substituting this expression for \(F_3\) in the expression for \(F_4\), we obtain
\[ \begin{aligned} F_4 &= \alpha Y_3 + (1 - \alpha) F_3 \\ &= \alpha Y_3 + (1 - \alpha)[\alpha Y_2 + (1 - \alpha) Y_1] \\ &= \alpha Y_3 + \alpha (1 - \alpha) Y_2 + (1 - \alpha)^2 Y_1 \end{aligned} \]
The term exponential smoothing comes from the exponential nature of the weighting scheme for the historical values.
We now see that \(F_4\) is a weighted average of the first three time series values. The sum of the coefficients, or weights, for \(Y_1\), \(Y_2\), and \(Y_3\) equals 1. A similar argument can be made to show that, in general, any forecast \(F_{t+1}\) is a weighted average of all the previous time series values.
Despite the fact that exponential smoothing provides a forecast that is a weighted average of all past observations, all past data do not need to be saved to compute the forecast for the next period. In fact, only two pieces of information are needed: \(Y_t\) and \(F_t\).
Gasoline sale time series
The number of gallons of gasoline sold by a distributor over a period of 12 weeks is given below. The distributor uses exponential smoothing to forecast sales. Which value for the smoothing constant \(\alpha\), 0.2 or 0.3, gives better forecasts?
| Week | Sales |
|---|---|
| 1 | 17 |
| 2 | 21 |
| 3 | 19 |
| 4 | 23 |
| 5 | 18 |
| 6 | 16 |
| 7 | 20 |
| 8 | 18 |
| 9 | 22 |
| 10 | 20 |
| 11 | 15 |
| 12 | 22 |
Gasoline sale time series with smoothing constant \(\alpha = 0.2\)
| Week | Time series value | Forecast | Forecast error | Squared forecast error |
|---|---|---|---|---|
| 1 | 17 | |||
| 2 | 21 | 17 | 4.00 | 16.00 |
| 3 | 19 | \(0.2(21) + 0.8(17)=17.80\) | 1.20 | 1.44 |
| 4 | 23 | \(0.2(19) + 0.8(17.80)=18.04\) | 4.96 | 24.60 |
| 5 | 18 | 19.03 | -1.03 | 1.06 |
| 6 | 16 | 18.83 | -2.83 | 8.01 |
| 7 | 20 | 18.26 | 1.74 | 3.03 |
| 8 | 18 | 18.61 | -0.61 | 0.37 |
| 9 | 22 | 18.49 | 3.51 | 12.32 |
| 10 | 20 | 19.19 | 0.81 | 0.66 |
| 11 | 15 | 19.35 | -4.35 | 18.92 |
| 12 | 22 | 18.48 | 3.52 | 12.39 |
| Totals | 10.92 | 98.80 |
Gasoline sale time series with smoothing constant \(\alpha = 0.2\)
\[ \text{MSE} = \frac{98.80}{11} = 8.98 \]
Gasoline sale time series with smoothing constant \(\alpha = 0.3\)
| Week | Time series value | Forecast | Forecast error | Squared forecast error |
|---|---|---|---|---|
| 1 | 17 | |||
| 2 | 21 | 17 | 4 | 16 |
| 3 | 19 | 18.20 | 0.80 | 0.64 |
| 4 | 23 | 18.44 | 4.56 | 20.79 |
| 5 | 18 | 19.81 | -1.81 | 3.28 |
| 6 | 16 | 19.27 | -3.27 | 10.69 |
| 7 | 20 | 18.29 | 1.71 | 2.92 |
| 8 | 18 | 18.80 | -0.80 | 0.64 |
| 9 | 22 | 18.56 | 3.44 | 11.83 |
| 10 | 20 | 19.59 | 0.41 | 0.17 |
| 11 | 15 | 19.71 | -4.71 | 22.18 |
| 12 | 22 | 18.30 | 3.70 | 13.69 |
| Totals | 8.03 | 102.83 |
Gasoline sale time series with smoothing constant \(\alpha = 0.3\)
\[ \text{MSE} = \frac{102.83}{11} = 9.35 \]
Gasoline sale time series with smoothing constant \(\alpha = 0.2\)
Lag variables are often used in time series analysis to capture the correlation (effect) of past time series values with (on) the present or future time series values.
For example, this quarter’s sales depend on (are correlated with) sales of the last three quarters.
At time any \(t\), variable \(v_t\) may depend on \(y_{t-1}, y_{t-2}, y_{t-3}, \dots\), which are denoted by Lag 1, Lag 2, Lag 3, respectively.
These lag variables are considered predictors of \(v_t\).
This type of correlation is called autocorrelation.
The choice of lag order (how many periods back to look) and the selection of an appropriate forecasting model are critical for forecasting lag variables effectively.
| Quarter | \(t\) | Sales ($Mil.) |
|---|---|---|
| Q210 | 1 | 8.75 |
| Q310 | 2 | 8.40 |
| Q410 | 3 | 14.10 |
| Q111 | 4 | 16.24 |
| Q211 | 5 | 18.65 |
| Q311 | 6 | 20.34 |
| Q411 | 7 | 17.07 |
| Q112 | 8 | 37.04 |
| Q212 | 9 | 35.06 |
| Q312 | 10 | 26.03 |
| Q412 | 11 | 26.91 |
| Q113 | 12 | 47.79 |
| … | … | … |
Assume lag order is 4.
| Quarter | \(t\) | Sales ($Mil.) | Lag 1 | Lag 2 | Lag 3 | Lag 4 |
|---|---|---|---|---|---|---|
| Q210 | 1 | 8.75 | ||||
| Q310 | 2 | 8.40 | 8.75 | |||
| Q410 | 3 | 14.10 | 8.40 | 8.75 | ||
| Q111 | 4 | 16.24 | 14.10 | 8.40 | 8.75 | |
| Q211 | 5 | 18.65 | 16.24 | 14.10 | 8.40 | 8.75 |
| Q311 | 6 | 20.34 | 18.65 | 16.24 | 14.10 | 8.40 |
| … | … | … | … | … | … | … |
| Sales ($Mil.) | Lag 1 | Lag 2 | Lag 3 | |
|---|---|---|---|---|
| Lag 1 | 0.642 | |||
| Lag 2 | 0.418 | 0.660 | ||
| Lag 3 | 0.603 | 0.440 | 0.660 | |
| Lag 4 | 0.914 | 0.605 | 0.437 | 0.664 |
Summary Output:
| Regression Statistics | |
|---|---|
| Multiple R | 0.9226 |
| R Square | 0.8511 |
| Adjusted R Square | 0.8252 |
| Standard Error | 7.0864 |
| Observations | 28 |
ANOVA
| df | SS | MS | F | Significance F |
|---|---|---|---|---|
| Regression | 4 | 6602.696 | 1650.674 | 32.871 |
| Residual | 23 | 1154.980 | 50.217 | |
| Total | 27 | 7757.676 |
Coefficients:
| Variables | Coefficients | Standard Error | t Stat | P-value |
|---|---|---|---|---|
| Intercept | 16.438 | 4.179 | 3.933 | 0.001 |
| Lag 1 | 0.011 | 0.122 | 0.091 | 0.928 |
| Lag 2 | -0.141 | 0.119 | -1.188 | 0.247 |
| Lag 3 | 0.007 | 0.121 | 0.057 | 0.955 |
| Lag 4 | 0.858 | 0.115 | 7.467 | 0.000 |
Summary Output:
| Regression Statistics | |
|---|---|
| Multiple R | 0.9144 |
| R Square | 0.8361 |
| Adjusted R Square | 0.8298 |
| Standard Error | 6.9933 |
| Observations | 28 |
ANOVA
| df | SS | MS | F | Significance F |
|---|---|---|---|---|
| Regression | 1 | 6486.115 | 6486.115 | 132.624 |
| Residual | 26 | 1271.561 | 48.906 | |
| Total | 27 | 7757.676 |
Coefficients:
| Variables | Coefficients | Standard Error | t Stat | P-value |
|---|---|---|---|---|
| Intercept | 13.161 | 3.002 | 4.384 | 0.000 |
| Lag 4 | 0.813 | 0.071 | 11.516 | 0.000 |
\[ \widehat{Sales_t} = 13.161 + (0.813) \widehat{Sales_{t-4}} \]
where the predictor is the Lag 4 variable.
Organize Your Data
Create the Lagged Variables
=A2 to reference the previous row’s value of \(Y\).=A2 to reference the value of \(Y\) two rows above.
Prepare Data for Regression
Open the Data Analysis Tool
Go to the Data tab > Data Analysis.
If Data Analysis isn’t visible, enable it via File > Options > Add-Ins > Excel Add-ins > Analysis ToolPak.
Select Regression: Choose Regression in the Data Analysis dialog and click OK.
Specify the Input Range
Input Y Range: Select the range of \(Y\) values, excluding the first two rows (e.g., A4:A20). Do not fill empty cells with zeros.
Input X Range: Select both lagged variable ranges (e.g., B4:C20). Do not fill empty cells with zeros.
Set Up Additional Options
Check Labels if headers were included in your input ranges.
Choose the output location for the regression results.
Run the Regression: Click OK to run. Excel will generate the regression output.
If a time series plot exhibits a linear trend, the method of least squares regression may be used to determine a trend line (projection) for future forecasts.
Least squares regression determines the unique trend line forecast, which minimizes the sum of squared (forecast) errors, SSE, between the trend line forecasts (predicted values) and the actual (observed) values for the time series.
The independent variable is the time period and the dependent variable is the actual observed value in the time series.
Bicycle Yearly Sales Time Series
| Time (Year) | Sales (1000s) |
|---|---|
| 1 | 21.6 |
| 2 | 22.9 |
| 3 | 25.5 |
| 4 | 21.9 |
| 5 | 23.9 |
| 6 | 27.5 |
| 7 | 31.5 |
| 8 | 29.7 |
| 9 | 28.6 |
| 10 | 31.4 |

Although it shows some up and down movement over the past 10 years, we can identify a linear trend line.
The trend line provides a reasonable approximation o the. long-run movement in the series.
We can use the simple linear regression method to develop such a linear trend line.
Linear Trend Equation
\[ \hat{Y_t} = b_0 + b_1 t, \quad t = 1, 2, \ldots \]
where:
Computing the Slope and Intercept for a Linear Trend
\[ b_1 = \frac{\sum_{t=1}^{n} (t - \bar{t})(Y_t - \bar{Y})}{\sum_{t=1}^{n} (t - \bar{t})^2}, \quad \text{or} \quad b_1 = \frac{\sum_{t=1}^{n} t Y_t - \left( \frac{\sum_{t=1}^{n} t \sum_{t=1}^{n} Y_t}{n} \right)}{\sum_{t=1}^{n} t^2 - \left( \frac{\left(\sum_{t=1}^{n} t\right)^2}{n} \right)} \]
\[ b_0 = \bar{Y} - b_1 \bar{t} \]
Where:
To compute the linear trend equation for the bicycle sales time series, we begin the calculations by computing \(\bar{t}\) and \(\bar{Y}\).
\[ \bar{t} = \frac{\sum_{t=1}^{n} t}{n} = \frac{55}{10} = 5.5 \]
\[ \bar{Y} = \frac{\sum_{t=1}^{n} Y_t}{n} = \frac{264.5}{10} = 26.45 \]
Using these values, and the data, we can compute: the slope and intercept.
\[ b_1 = \frac{\sum_{t=1}^{n} (t - \bar{t})(Y_t - \bar{Y})}{\sum_{t=1}^{n} (t - \bar{t})^2} = \frac{90.75}{82.5} = 1.1 \]
\[ b_0 = \bar{Y} - b_1 \bar{t} = 26.45 - 1.1(5.5) = 20.4 \]
Therefore, the linear trend equation is:
\[ \hat{Y}_t = 20.4 + 1.1t \]
Bicycle Sales Time Series
| \(t\) | \(Y_1\) | \(t - \bar{t}\) | \(Y_1 - \bar{Y}\) | \((t - \bar{t})(Y_1 - \bar{Y})\) | \((t - \bar{t})^2\) |
|---|---|---|---|---|---|
| 1 | 21.6 | -4.5 | -4.85 | 21.825 | 20.25 |
| 2 | 22.9 | -3.5 | -3.55 | 12.425 | 12.25 |
| 3 | 25.5 | -2.5 | -0.95 | 2.375 | 6.25 |
| 4 | 21.9 | -1.5 | -4.55 | 6.825 | 2.25 |
| 5 | 23.9 | -0.5 | -2.55 | 1.275 | 0.25 |
| 6 | 27.5 | 0.5 | 1.05 | 0.525 | 0.25 |
| 7 | 31.5 | 1.5 | 5.05 | 7.575 | 2.25 |
| 8 | 29.7 | 2.5 | 3.25 | 8.125 | 6.25 |
| 9 | 28.6 | 3.5 | 2.15 | 7.525 | 12.25 |
| 10 | 31.4 | 4.5 | 4.95 | 22.275 | 20.25 |
| Totals | 264.5 | 90.750 | 82.5 |
Linear Trend Equation for the Bicycle Sales Time Series:
\[ \hat{Y}_t = 20.4 + 1.1t \]
The slope 1.1 indicates that over the past 10 years the firm experienced an average growth in sales of about 1100 units per year.
If we assume that the past 10-year trend in sales is a good indicator of the future, this trend equation can be used to develop forecasts for future time periods. Substituting \(t = 11\) into the equation yields next year’s trend projection or forecast, \(\hat{Y}_{11}\).
\[ \hat{Y}_t = 20.4 + 1.1 \times 11 = 32.5 \]
Thus, using trend projection, we would forecast sales of 32,500 bicycles next year.
Bicycle Sales Time Series
| Year | Sales | Forecast | Forecast error | Squared Forecast Error |
|---|---|---|---|---|
| 1 | 21.6 | 21.5 | 0.1 | 0.01 |
| 2 | 22.9 | 22.6 | 0.3 | 0.09 |
| 3 | 25.5 | 23.7 | 1.8 | 3.24 |
| 4 | 21.9 | 24.8 | -2.9 | 8.41 |
| 5 | 23.9 | 25.9 | -2.0 | 4.00 |
| 6 | 27.5 | 27.8 | 0.5 | 0.25 |
| 7 | 31.5 | 28.1 | 3.4 | 11.56 |
| 8 | 29.7 | 29.2 | 0.5 | 0.25 |
| 9 | 28.6 | 30.3 | -1.7 | 2.89 |
| 10 | 31.4 | 31.4 | 0.0 | 0.00 |
| Total | 264.5 | 30.7 |
\[ \text{MSE} = \frac{\sum_{t=1}^{n} (Y_t - F_t)^2}{n} = \frac{30.7}{10} = 3.07 \]
Step 1: Click the Data tab on the Ribbon
Step 2: In the Analyze group, click Data Analysis
Step 3: Choose Regression from the list of Analysis Tools
Step 4: When the Regression dialog box appears:
B1:B11 in the Input Y Range boxA1:A11 in the Input X Range boxA13 in the Output Range box (to identify the upper left corner of the section of the worksheet where the output will appear)
Sometimes time series have a curvilinear or nonlinear trend.
A variety of nonlinear functions can be used to develop an estimate of the trend in a time series.
One example is this quadratic trend equation:
\[ \hat{Y_t} = b_0 + b_1t + b_2t^2 \]
\[ \hat{Y_t} = b_0(b_1)^t \]
Cholesterol Revenue Time Series
The data regarding cholesterol sales over a period of 10 years is tabulated below. A curvilinear function appears to be needed to model the long-term trend.
| Year | Revenue ($ millions) |
|---|---|
| 1 | 23.1 |
| 2 | 21.3 |
| 3 | 27.4 |
| 4 | 34.6 |
| 5 | 33.8 |
| 6 | 43.2 |
| 7 | 59.5 |
| 8 | 64.4 |
| 9 | 74.2 |
| 10 | 99.3 |

Step 1: Click the Data tab on the Ribbon
Step 2: In the Analyze group, click Data Analysis
Step 3: Choose Regression from the list of Analysis Tools
Step 4: When the Regression dialog box appears:
C1:C11 in the Input Y Range boxA1:B11 in the Input X Range boxA13 in the Output Range box (to identify the upper left corner of the section of the worksheet where the output will appear)
The estimated regression equation is:
\[ \hat{Y}_t = 24.1817 - 2.1060t + 0.9216t^2 \]
The forecast of sales revenue for year 11 is:
\[ \hat{Y}_t = 24.1817 - 2.1060(11) + 0.9216(11)^2 = 112.53, \quad \text{or} \quad \approx \text{\$112.5 million}. \]
Steps for Creating a Scatter Plot with Trendline


When seasonality exists, we need to incorporate it into our forecasting models to ensure accurate forecasts.
We will first look at the case of a seasonal time series with no trend and then discuss how to model seasonality with trend.
Seasonal patterns are recognized by seeing the same repeating pattern of highs and lows over successive and short periods of time (within a “short” period of time).
A seasonal pattern might occur within a day, week, month, quarter, year, or some other interval no greater than a year.
A seasonal pattern does not necessarily refer to the four seasons of the year (spring, summer, fall, and winter).
| Year | Quarter | Sales |
|---|---|---|
| 1 | 1 | 125 |
| 1 | 2 | 153 |
| 1 | 3 | 106 |
| 1 | 4 | 88 |
| 2 | 1 | 118 |
| 2 | 2 | 161 |
| 2 | 3 | 133 |
| 2 | 4 | 102 |
| 3 | 1 | 138 |
| 3 | 2 | 144 |
| 3 | 3 | 113 |
| 3 | 4 | 80 |
| 4 | 1 | 109 |
| 4 | 2 | 137 |
| 4 | 3 | 125 |
| 4 | 4 | 109 |
| 5 | 1 | 130 |
| 5 | 2 | 165 |
| 5 | 3 | 128 |
| 5 | 4 | 96 |


The time series plot does not clearly indicate any long-term trend in sales.
However, close inspection of the plot does reveal a seasonal pattern.
Recall that dummy variables can be used to deal with categorical independent variables in a multiple regression model.
We will treat the season as a categorical variable.
Recall that when a categorical variable has \(k\) levels, \(k - 1\) dummy variables are required.
If there are four seasons, we have the following dummy variables with \(Qtr4\) being the reference level:
\[ \text{Qtr1} = \begin{cases} 1 & \text{if Quarter 1} \\ 0 & \text{otherwise} \end{cases} \quad \text{Qtr2} = \begin{cases} 1 & \text{if Quarter 2} \\ 0 & \text{otherwise} \end{cases} \quad \text{Qtr3} = \begin{cases} 1 & \text{if Quarter 3} \\ 0 & \text{otherwise} \end{cases} \]
| Year | Quarter | Qtr1 | Qtr2 | Qtr3 | Sales |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 0 | 125 |
| 1 | 2 | 0 | 1 | 0 | 153 |
| 1 | 3 | 0 | 0 | 1 | 106 |
| 1 | 4 | 0 | 0 | 0 | 88 |
| 2 | 1 | 1 | 0 | 0 | 118 |
| 2 | 2 | 0 | 1 | 0 | 161 |
| 2 | 3 | 0 | 0 | 1 | 133 |
| 2 | 4 | 0 | 0 | 0 | 102 |
| 3 | 1 | 1 | 0 | 0 | 138 |
| 3 | 2 | 0 | 1 | 0 | 144 |
| 3 | 3 | 0 | 0 | 1 | 113 |
| 3 | 4 | 0 | 0 | 0 | 80 |
| 4 | 1 | 1 | 0 | 0 | 109 |
| 4 | 2 | 0 | 1 | 0 | 137 |
| 4 | 3 | 0 | 0 | 1 | 125 |
| 4 | 4 | 0 | 0 | 0 | 109 |
| 5 | 1 | 1 | 0 | 0 | 130 |
| 5 | 2 | 0 | 1 | 0 | 165 |
| 5 | 3 | 0 | 0 | 1 | 128 |
| 5 | 4 | 0 | 0 | 0 | 96 |

\[ \hat{Y} = b_0 + b_1(Qtr1) + b_2(Qtr2) + b_3(Qtr3) \]
\[ \text{Forecasted Sales} = 95.0 + 29.0(Qtr1) + 57.0(Qtr2) + 26.0(Qtr3) \]
Forecast of Mean Quarterly Sales in Year 6
Note that we could have obtained the quarterly forecasts for next year simply by computing the average number of umbrellas sold in each quarter:
| Year | Quarter 1 | Quarter 2 | Quarter 3 | Quarter 4 |
|---|---|---|---|---|
| 1 | 125 | 153 | 106 | 88 |
| 2 | 118 | 161 | 133 | 102 |
| 3 | 138 | 144 | 113 | 80 |
| 4 | 109 | 137 | 125 | 109 |
| 5 | 130 | 165 | 128 | 96 |
| Average | 124 | 152 | 121 | 95 |
Nonetheless, the regression output provides additional information that can be used to assess the accuracy of the forecast and determine the significance of the results.
Smartphone Quarterly Sales Data (in Thousands)
| Year | Quarter | Sales (1000s) |
|---|---|---|
| 1 | 1 | 4.8 |
| 2 | 4.1 | |
| 3 | 6.0 | |
| 4 | 6.5 | |
| 2 | 1 | 5.8 |
| 2 | 5.2 | |
| 3 | 6.8 | |
| 4 | 7.4 | |
| 3 | 1 | 6.0 |
| 2 | 5.6 | |
| 3 | 7.5 | |
| 4 | 7.8 | |
| 4 | 1 | 6.3 |
| 2 | 5.9 | |
| 3 | 8.0 | |
| 4 | 8.4 |

Multiple Regression Equation for Seasonal Effects and Trend
The general form of the estimated multiple regression equation for modeling both the quarterly seasonal effects and the linear trend in the smartphone time series is as follows:
\[ \hat{Y}_t = b_0 + b_1 \text{Qtr1} + b_2 \text{Qtr2} + b_3 \text{Qtr3} + b_4 t \]
where
Quarterly Sales Data with Indicators and Period
| Year | Quarter | Qtr1 | Qtr2 | Qtr3 | Period | Sales (1000s) |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 0 | 0 | 1 | 4.8 |
| 2 | 0 | 1 | 0 | 2 | 4.1 | |
| 3 | 0 | 0 | 1 | 3 | 6.0 | |
| 4 | 0 | 0 | 0 | 4 | 6.5 | |
| 2 | 1 | 1 | 0 | 0 | 5 | 5.8 |
| 2 | 0 | 1 | 0 | 6 | 5.2 | |
| 3 | 0 | 0 | 1 | 7 | 6.8 | |
| 4 | 0 | 0 | 0 | 8 | 7.4 | |
| 3 | 1 | 1 | 0 | 0 | 9 | 6.0 |
| 2 | 0 | 1 | 0 | 10 | 5.6 | |
| 3 | 0 | 0 | 1 | 11 | 7.5 | |
| 4 | 0 | 0 | 0 | 12 | 7.8 | |
| 4 | 1 | 1 | 0 | 0 | 13 | 6.3 |
| 2 | 0 | 1 | 0 | 14 | 5.9 | |
| 3 | 0 | 0 | 1 | 15 | 8.0 | |
| 4 | 0 | 0 | 0 | 16 | 8.4 |

General Form of Estimated Regression Equation
\[ \hat{Y} = b_0 + b_1 Qtr1 + b_2 \cdot Qtr2 + b_3 \cdot Qtr3 + b_4 \cdot t \]
Where \(t\) is the time period \(t = 1, 2, \ldots\)
Forecast for Time Period 17 (Quarter 1 in Year 5)
\[ \text{Sales} = 6.07 - 1.36(1) - 2.03(0) - .304(0) + .146(17) = 7.19 \]
Forecast for Time Period 18 (Quarter 2 in Year 5)
\[ \text{Sales} = 6.07 - 1.36(0) - 2.03(1) - .304(0) + .146(18) = 6.67 \]
Forecast for Time Period 19 (Quarter 3 in Year 5)
\[ \text{Sales} = 6.07 - 1.36(0) - 2.03(0) - .304(1) + .146(19) = 8.54 \]
Forecast for Time Period 20 (Quarter 4 in Year 5)
\[ \text{Sales} = 6.07 - 1.36(0) - 2.03(0) - .304(0) + .146(20) = 8.99 \]
Thus, accounting for the seasonal effects and the linear trend in smartphone sales, the estimates of quarterly sales in year 5 are 7190, 6670, 8540, and 8990.
The dummy variables in the estimated multiple regression equation provide four estimated multiple regression equations, one for each quarter.
\[ \text{Quarter 1: Sales} = 6.07 - 1.36(1) - 2.03(0) - .304(0) + .146t = 4.71 + .146t \]
Similarly, if time period \(t\) corresponds to quarters 2, 3, and 4, the estimates of quarterly sales are:
\[ \begin{align*} \text{Quarter 2: Sales} & = 6.07 - 1.36(0) - 2.03(1) - .304(0) + .146t = 4.04 + .146t \\ \text{Quarter 3: Sales} & = 6.07 - 1.36(0) - 2.03(0) - .304(1) + .146t = 5.77 + .146t \\ \text{Quarter 4: Sales} & = 6.07 - 1.36(0) - 2.03(0) - .304(0) + .146t = 6.07 + .146t \end{align*} \]
Interpretation of the Regression Coefficients
The slope of the trend line for each quarterly forecast equation is .146, indicating a growth in sales of about 146 smartphones per quarter.
The only difference in the four equations is that they have different intercepts.
For instance, the intercept for the quarter 1 equation is 4.71 and the intercept for the quarter 4 equation is 6.07. Thus, sales in quarter 1 are
\[ 4.71 - 6.07 = -1.36 \]
or 1360 smartphones less than in quarter 4.
In other words, the estimated regression coefficient for Qtr1 provides an estimate of the difference in sales between quarter 1 and quarter 4.
Understanding what is really going on with a time series often depends upon the use of deseasonalized data.
For example, we might be interested in learning whether electrical power consumption is increasing in our area.
Suppose we learn that electric power consumption in September is down 3% from the previous month. Is it correct to conclude that the electric power consumption is decreasing?
We must be careful with such information, because whenever a seasonal influence is present, such comparisons may be misleading if the data have not been deseasonalized.
The fact that electric power consumption is down 3% from August to September might be only the seasonal effect associated with a decrease in the use of air conditioning and not because of a long-term decline in the use of electric power.
Indeed, after adjusting for the seasonal effect, we might even find that the use of electric power increased.
Many other time series, such as unemployment statistics, home sales, and retail sales, are subject to strong seasonal influences. It is important to deseasonalize such data before making a judgment about any long-term trend.
Time series decomposition can be used to separate or decompose a time series. It assumes that \(Y_t\), the actual time series value at period \(t\), is a function of three components:
How these three components are combined to generate the observed values of the time series depends upon whether we assume the relationship is best described by an additive or a multiplicative model.
If the seasonal fluctuations change over time, growing larger as the sales volume increases because of a long-term linear trend, then a multiplicative model should be used.
Many business and economic time series follow this pattern.
An additive model is appropriate in situations where the seasonal fluctuations do not depend upon the level of the time series.
If the sizes of the seasonal fluctuations in earlier time periods are about the same as the sizes of the seasonal fluctuations in later time periods, an additive model is appropriate.
An additive decomposition model takes the following form:
\[ Y_t = Trend_t + Seasonal_t + Irregular_t \]
where
\(Trend_t\) = trend value at time period \(t\)
\(Seasonal_t\) = seasonal value at time period \(t\)
\(Irregular_t\) = irregular value at time period \(t\)
In an additive model, the values for the three components are simply added together to obtain the actual time series value \(Y_t\).
The irregular or error component accounts for the variability in the time series that cannot be explained by the trend and seasonal components.
A multiplicative decomposition model takes the following form:
\[ Y_t = Trend_t \times Seasonal_t \times Irregular_t \]
where
\(Trend_t\) = trend value at time period \(t\)
\(Seasonal_t\) = seasonal index at time period \(t\)
\(Irregular_t\) = irregular index at time period \(t\)
The Census Bureau uses a multiplicative model in conjunction with its methodology for deseasonalizing time series.
In this model, the trend, seasonal, and irregular components are multiplied to give the value of the time series.
Trend is measured in units of the item being forecast.
The seasonal and irregular components are measured in relative terms:
Sales are lowest in the second quarter of each year and increase in quarters 3 and 4.
We conclude that a seasonal pattern exists for the smartphone sales time series.
The computational procedure to identify each quarter’s seasonal influence begins by computing a moving average to remove the combined seasonal and irregular effects from the data, leaving us with a time series that contains only trend and any remaining random variation not removed by the moving average calculations.
Steps 1-2: Calculate centered moving averages (Initial deseasonalized data).
Step 3: Determine the seasonal and irregular factors for each time series value. (Individual factors due to seasonality and irregularity).
Step 4: Determine Seasonal Index (SI) for each season (Updating the factors).
Step 5: Determine the deseasonalized data (Update the deseasonalized data using SI).
Step 6: Determine the Trend Component of the deseasonalized data (Model trend).
Step 7: Determine the deseasonalized predictions (based on the Trend Component). (Forecast trend).
Step 8: Add the seasonality to the predictions.
Moving Average Calculation
Because we are working with a quarterly series, we will use four data values in each moving average.
The moving average calculation for the first four quarters of the smartphone sales data is:
\[ \text{First moving average} = \frac{4.8 + 4.1 + 6.0 + 6.5}{4} = \frac{21.4}{4} = 5.35 \]
Moving Average Calculation
Note that the moving average calculation for the first four quarters yields the average quarterly sales over year 1 of the time series.
Continuing the moving average calculations, we next add the 5.8 value for the first quarter of year 2 and drop the 4.8 for the first quarter of year 1.
Thus, the second moving average is:
\[ \text{Second moving average} = \frac{4.1 + 6.0 + 6.5 + 5.8}{4} = \frac{22.4}{4} = 5.60 \]
Moving Average Calculation
Similarly, the third moving average calculation is:
\[ (6.0 + 6.5 + 5.8 + 5.2) / 4 = 5.875 \]
Before we proceed with the moving average calculations for the entire time series, let us return to the first moving average calculation, which resulted in a value of 5.35.
The 5.35 value is the average quarterly sales volume for year 1.
As we look back at the calculation of the 5.35 value, associating 5.35 with the “middle” of the moving average group makes sense.
Period Alignment with Moving Averages
Note that with four quarters in the moving average, there is no middle period.
The 5.35 value really corresponds to period 2.5, the last half of quarter 2 and the first half of quarter 3.
Similarly, if we go to the next moving average value of 5.60, the middle period corresponds to period 3.5, the last half of quarter 3 and the first half of quarter 4.
Centering Moving Averages
The two moving average values we computed do not correspond directly to the original quarters of the time series.
We can resolve this by computing the average of the two moving averages.
The center of the first moving average is period 2.5, and the center of the second moving average is period 3.5.
By averaging these, we center the moving average at quarter 3, exactly where it should be.
Centered Moving Average Calculation
This moving average is referred to as a centered moving average.
\[ \frac{5.35 + 5.60}{2} = 5.475 \]
\[ \frac{5.60 + 5.875}{2} = 5.738 \]
Centered Moving Average Calculations
Summary of the moving average and centered moving average calculations for the smartphone sales data.
| Year | Quarter | Sales (1000s) | Four-Quarter Moving Average | Centered Moving Average |
|---|---|---|---|---|
| 1 | 1 | 4.8 | ||
| 1 | 2 | 4.1 | ||
| 5.350 | 5.475 | |||
| 1 | 3 | 6.0 | ||
| 5.600 | 5.738 | |||
| 1 | 4 | 6.5 | ||
| 5.875 | 5.975 | |||
| 2 | 1 | 5.8 | ||
| 6.075 | 6.188 | |||
| 2 | 2 | 5.2 | ||
| 6.300 | 6.325 | |||
| 2 | 3 | 6.8 | ||
| 6.350 | 6.400 | |||
| 2 | 4 | 7.4 | ||
| 6.450 | 6.538 | |||
| 3 | 1 | 6.0 | ||
| 6.625 | 6.675 | |||
| 3 | 2 | 5.6 | ||
| 6.725 | 6.763 | |||
| 3 | 3 | 7.5 | ||
| 6.800 | 6.838 | |||
| 3 | 4 | 7.8 | ||
| 6.875 | 6.938 | |||
| 4 | 1 | 6.3 | ||
| 7.000 | 7.075 | |||
| 4 | 2 | 5.9 | ||
| 7.150 | ||||
| 4 | 3 | 8.0 | ||
| 4 | 4 | 8.4 |
What do the centered moving averages tell us about this time series?
The Figure shows a time series plot of the actual time series values and the centered moving average values.
Note particularly how the centered moving average values tend to “smooth out” both the seasonal and irregular fluctuations in the time series.
The centered moving averages represent the trend in the data and any random variation that was not removed by using moving averages to smooth the data.
The multiplicative decomposition model is:
\[ Y_t = Trend_t \times Seasonal_t \times Irregular_t \]
By dividing each side of this equation by the trend component \(T_t\), we can identify the combined seasonal-irregular effect in the time series.
\[ \frac{Y_t}{Trend_t} = \frac{Trend_t \times Seasonal_t \times Irregular_t}{Trend_t} = Seasonal_t \times Irregular_t \]
For example, the third quarter of year 1 shows a trend value of 5.475 (the centered moving average).
So:
\[ \frac{6.0}{5.475} = 1.096 \]
This is the combined seasonal-irregular value.
Seasonal-irregular values for the entire time series.
| Year | Quarter | Sales (1000s) | Centered Moving Average | Seasonal-Irregular Value |
|---|---|---|---|---|
| 1 | 1 | 4.8 | ||
| 1 | 2 | 4.1 | ||
| 1 | 3 | 6.0 | 5.475 | 1.096 |
| 1 | 4 | 6.5 | 5.738 | 1.133 |
| 2 | 1 | 5.8 | 5.975 | 0.971 |
| 2 | 2 | 5.2 | 6.188 | 0.840 |
| 2 | 3 | 6.8 | 6.325 | 1.075 |
| 2 | 4 | 7.4 | 6.400 | 1.156 |
| 3 | 1 | 6.0 | 6.538 | 0.918 |
| 3 | 2 | 5.6 | 6.675 | 0.839 |
| 3 | 3 | 7.5 | 6.763 | 1.109 |
| 3 | 4 | 7.8 | 6.838 | 1.141 |
| 4 | 1 | 6.3 | 6.938 | 0.908 |
| 4 | 2 | 5.9 | 7.075 | 0.834 |
| 4 | 3 | 8.0 | ||
| 4 | 4 | 8.4 |
Seasonal-Irregular Values for Third Quarter
Consider the seasonal-irregular values for the third quarter: 1.096, 1.075, and 1.109.
Seasonal-irregular values greater than 1.00 indicate effects above the trend estimate, while values below 1.00 indicate effects below the trend estimate.
The three seasonal-irregular values for quarter 3 show an above-average effect in the third quarter.
Averaging to Estimate Seasonal Influence
Since the year-to-year fluctuations in the seasonal-irregular values are primarily due to random error, we can average the computed values to eliminate the irregular influence and obtain an estimate of the third-quarter seasonal influence.
\[ \text{Seasonal effect of quarter 3} = \frac{1.096 + 1.075 + 1.109}{3} = 1.09 \]
Seasonal-Irregular Values and Seasonal Indexes
| Quarter | Seasonal-Irregular Values | Seasonal Index |
|---|---|---|
| 1 | 0.971, 0.918, 0.908 | 0.93 |
| 2 | 0.840, 0.839, 0.834 | 0.84 |
| 3 | 1.096, 1.075, 1.109 | 1.09 |
| 4 | 1.133, 1.156, 1.141 | 1.14 |
The seasonal indexes for the four quarters are 0.93, 0.84, 1.09, and 1.14.
The best sales quarter is the fourth quarter, with sales averaging 14% above the trend estimate.
The worst, or slowest, sales quarter is the second quarter; its seasonal index of 0.84 shows that the sales average is 16% below the trend estimate.
The seasonal component corresponds clearly to the intuitive expectation that smartphone sales increase when a new school year begins (quarter 3) and for the holiday season (quarter 4).
One final adjustment is sometimes necessary in obtaining the seasonal indexes.
The multiplicative model requires that the average seasonal index equal 1.00, so the sum of the four seasonal indexes must equal 4.00.
In other words, the seasonal effects must even out over the year.
The average of the seasonal indexes in our example is equal to 1.00, so this type of adjustment is not necessary.
Making Adjustments (If Required)
In cases where adjustment is necessary:
For quarterly data, this means:
\[ \text{Adjusted Seasonal Index} = \frac{4}{\text{sum of the unadjusted seasonal indexes}} \]
A time series that has had the seasonal effects removed is referred to as a deseasonalized time series, and the process of using the seasonal indexes to remove the seasonal effects from a time series is referred to as deseasonalizing the time series.
Using a multiplicative decomposition model, we deseasonalize a time series by dividing each observation by its corresponding seasonal index.
The multiplicative decomposition model is:
\[ Y_t = Trend_t \times Seasonal_t \times Irregular_t \]
When we divide each time series observation \(Y_t\) by its corresponding seasonal index, the resulting data show only trend and random variability (the irregular component).
| Year | Quarter | Time Period | Sales (1000s) | Seasonal Index | Deseasonalized Sales |
|---|---|---|---|---|---|
| 1 | 1 | 1 | 4.8 | 0.93 | 5.16 |
| 1 | 2 | 2 | 4.1 | 0.84 | 4.88 |
| 1 | 3 | 3 | 6.0 | 1.09 | 5.50 |
| 1 | 4 | 4 | 6.5 | 1.14 | 5.70 |
| 2 | 1 | 5 | 5.8 | 0.93 | 6.24 |
| 2 | 2 | 6 | 5.2 | 0.84 | 6.19 |
| 2 | 3 | 7 | 6.8 | 1.09 | 6.24 |
| 2 | 4 | 8 | 7.4 | 1.14 | 6.49 |
| 3 | 1 | 9 | 6.0 | 0.93 | 6.45 |
| 3 | 2 | 10 | 5.6 | 0.84 | 6.67 |
| 3 | 3 | 11 | 7.5 | 1.09 | 6.88 |
| 3 | 4 | 12 | 7.8 | 1.14 | 6.84 |
| 4 | 1 | 13 | 6.3 | 0.93 | 6.77 |
| 4 | 2 | 14 | 5.9 | 0.84 | 7.02 |
| 4 | 3 | 15 | 8.0 | 1.09 | 7.34 |
| 4 | 4 | 16 | 8.4 | 1.14 | 7.37 |
Identifying Trend in Deseasonalized Data
The graph of the deseasonalized smartphone sales time series shown in the Figure appears to have an upward linear trend.
To identify this trend, we will fit a linear trend equation to the deseasonalized time series.
The only difference is that we will be fitting a trend line to the deseasonalized data instead of the original data.
Recall that for a linear trend, the estimated regression equation can be written as:
\[ T_t = b_0 + b_1 t \]
where
To fit a linear trend line to the deseasonalized data, the only change is that the deseasonalized time series values are used instead of the observed values \(Y_t\) in computing \(b_0\) and \(b_1\).
Using Excel’s Regression tool, the estimated linear trend equation is:
\[ \text{Deseasonalized Sales} = 5.10 + 0.148t \]
The slope of 0.148 indicates that over the past 16 quarters, the firm averaged a deseasonalized growth in sales of about 148 smartphones per quarter.
If we assume that the past 16-quarter trend in sales data is a reasonably good indicator of the future, this equation can be used to develop a trend projection for future quarters.
Substituting \(t = 17\) into the equation yields next quarter’s deseasonalized trend projection, \(T_{17}\):
\[ T_{17} = 5.10 + 0.148(17) = 7.616 \]
Thus, using the deseasonalized data, the linear trend forecast for next quarter (period 17) is 7616 smartphones.
Similarly, the deseasonalized trend forecasts for the next three quarters (periods 18, 19, and 20) are 7764, 7912, and 8060 smartphones, respectively.
The final step in developing the forecast when both trend and seasonal components are present is to use the seasonal indexes to adjust the deseasonalized trend projections.
For the smartphone sales example, we have a deseasonalized trend projection for the next four quarters.
Now, we must adjust the forecast for the seasonal effect.
The seasonal index for the first quarter of year 5 (\(t = 17\)) is 0.93, so we obtain the quarterly forecast by multiplying the deseasonalized forecast based on trend \((T_{17} = 7616)\) by the seasonal index (0.93). Thus, the forecast for the next quarter is \(7616 \times 0.93 = 7083\).
| Year | Quarter | Deseasonalized Trend Forecast | Seasonal Index | Quarterly Forecast |
|---|---|---|---|---|
| 5 | 1 | 7616 | 0.93 | \(7616 \times 0.93 = 7083\) |
| 5 | 2 | 7764 | 0.84 | \(7764 \times 0.84 = 6522\) |
| 5 | 3 | 7912 | 1.09 | \(7912 \times 1.09 = 8624\) |
| 5 | 4 | 8060 | 1.14 | \(8060 \times 1.14 = 9188\) |
The high-volume fourth quarter has a 9188-unit forecast, and the low-volume second quarter has a 6522-unit forecast.
Mathematically, the multiplicative model can be expanded to include a cyclical component.
\[ Y_t = Trend_t \times Cyclical_t \times Seasonal_t \times Irregular_t \]
The cyclical component, like the seasonal component, is expressed as a percentage of trend.
This component is attributable to multiyear cycles in the time series.
It is analogous to the seasonal component but over a longer period. However, due to the length of time involved, obtaining enough relevant data to estimate the cyclical component is often difficult.
Cycles usually vary in length, making it difficult to identify or separate cyclical effects from long-term trend effects.
In practice, these effects are often combined and referred to as a combined trend-cycle component.
Some key takeaways from this session:
Time Series Analysis: Understand patterns (trend, seasonality, cycles) to forecast future values.
Forecast Accuracy: Key measures include MAE, MSE, and MAPE for assessing model performance.
Smoothing Techniques: Moving averages and exponential smoothing help in handling horizontal patterns.
Trend and Seasonality: Apply decomposition and regression for more complex time series with trends and seasonal variations.
Important: The following table is not exhaustive; it merely illustrates the examples discussed in the lecture. Remember that within the General Linear Model framework, these methods can be combined and adapted based on the specific objectives of the analysis.
| Forecasting Method / Pattern | Horizontal | Linear Trend | Non-linear Trend | Seasonal | Seasonal + Trend |
|---|---|---|---|---|---|
| 1. Naive Forecast | X | ||||
| 2. Hist. Data Average Method | X | ||||
| 3. Moving Average (\(k\) and \(1/k\)) | X | ||||
| 4. Weighted Moving Average (\(k\), weights should sum to 1) | X | ||||
| 5. Exponential Smoothing (\(\alpha\)) | X | ||||
| 6. Regression Model: Lag Variable | X | X | X | X | |
| 7. Regression Model: Time as Independent Variable | X | ||||
| 8. Regression Model: Time as Independent Variable with Transformation | X | ||||
| 9. Regression Model: Season as Independent Variable | X | ||||
| 10. Regression Model: Time and Season as Independent Variables | X |
Business Statistics