Logo
Overview

Time Series

October 12, 2025
7 min read

Many advertising measures can often be considered as time series data:

  1. Impressions over time
  2. Click-through rate (CTR)
  3. Ad spend over time
  4. Conversion rates over time

What is a Time Series?

A sequence of data points collected X1,X2,,XtX_1, X_2, \ldots, X_t over time tt.

  1. Time series analysis to find the below:

    Xt=Tt+St+YtX_t = T_t + S_t + Y_t

    where:

    • TtT_t: Trend component (long-term movement)
    • StS_t: Seasonal component (regular patterns)
    • YtY_t: Irregular component (stationary noise at time tt)
    • CtC_t: Cyclical component (long-term cycles) - not shown
  2. Time series forecasting:

    XttT+1,T+2,X_t \\ t \in {T+1, T+2, \ldots}

Characteristics of Time Series Data

  1. Stochastic Process: A collection of random variables indexed by time.

    {Xt:tT}\{X_t : t \in T\}
  2. Dependency: There is dependency between random variables at different time points.

    • Hence, we need to consider joint distributions, not only marginal distributions.
    • Auto-covariance and auto-correlation functions are used to measure this dependency.
  3. Stationarity: A time series is stationary if its statistical properties (mean, variance, autocovariance) do not change over time.

    • For non-stationary series, we can often transform them to stationary series using differencing, detrending, nonlinear transformations, etc.
Note

Strict Stationarity

A time series XtX_t is said to be strictly stationary if the joint distribution of (Xt1,Xt2,,Xtk)(X_{t_1}, X_{t_2}, \ldots, X_{t_k}) is the same as that of (Xt1+h,Xt2+h,,Xtk+h)(X_{t_1+h}, X_{t_2+h}, \ldots, X_{t_k+h}) for all t1,t2,,tkt_1, t_2, \ldots, t_k and all hh. The joint distribution of (Xt1,Xt2,,Xtk)(X_{t_1}, X_{t_2}, \ldots, X_{t_k}) is invariant under time shift. This is a strong condition; rarely encountered in practice.

Weak Stationarity

A time series XtX_t is said to be weakly stationary if its mean and variance are constant over time, and the covariance between XtX_t and Xt+hX_{t+h} depends only on the lag hh and not on the actual time tt.

In practice, weak stationarity is a more common assumption than strict stationarity, as many time series exhibit constant mean and variance but may not have identical joint distributions under time shifts.

Mathematical Properties

Lag-l autocovariance of XtX_t

γ(l)=Cov(Xt,Xtl)\gamma(l) = Cov(X_t, X_{t-l})

where:

  • γ(l)\gamma(l): lag-l autocovariance
  • γ(0)\gamma(0): variance of XtX_t
  • γ(l)=γ(l)\gamma(l) = \gamma(-l): symmetry property

Lag-l autocorrelation of XtX_t

ρ(l)=Cov(Xt,Xtl)Var(Xt)Var(Xtl)=Cov(Xt,Xtl)Var(Xt)=γ(l)γ(0)\rho(l) = \frac{Cov(X_t, X_{t-l})}{\sqrt{Var(X_t) Var(X_{t-l})}} = \frac{Cov(X_t, X_{t-l})}{Var(X_t)} = \frac{\gamma(l)}{\gamma(0)}

Where the property of Var(Xt)=Var(Xtl)Var(X_t) = Var(X_{t-l}) for a weakly stationary process.

High autocorrelation at lag ll indicates a strong linear relationship between XtX_t and XtlX_{t-l}, meaning past values have a significant influence on current values.

In general the lagllag-l autocorrelation of XtX_t is defined as:

ρ^(l)=t=l+1T(XtXˉ)(XtlXˉ)t=1T(XtXˉ)2\hat{\rho}(l) = \frac{\sum_{t=l+1}^{T} (X_t - \bar{X})(X_{t-l} - \bar{X})}{\sum_{t=1}^{T} (X_t - \bar{X})^2}

where 0lT10 \leq l \leq T-1 and Xˉ\bar{X} is the sample mean of XtX_t.

Examples of Time Series Models

Choose model based on the characteristics of the time series data.

  • Autoregressive (AR) model
  • Moving Average (MA) model
  • Integrated (I) model
  • ARMA model
  • ARIMA model
  • Seasonal ARIMA (SARIMA) model
  • Fractional ARIMA (FARIMA) model

Autoregressive (AR) Model

An AR model expresses predicts the variable as a linear regression of the past values of the variable.

For AR model of order pp (AR(p)):

X^t=a0+a1Xt1+a2Xt2++apXtp\hat{X}_t = a_0 + a_1 X_{t-1} + a_2 X_{t-2} + \ldots + a_p X_{t-p}

where X^t\hat{X}_t is the best estimate of XtX_t given past values, and a0,a1,,apa_0, a_1, \ldots, a_p are the model parameters to be estimated from the data.

  • Error: et=XtX^te_t = X_t - \hat{X}_t
  • Model: Xt=a0+a1Xt1+etX_t = a_0 + a_1 X_{t-1} +e_t (for AR(1))
  • Minimize the sum of squared errors (SSE): SSE=t=1Tet2SSE = \sum_{t=1}^{T} e_t^2

Assumptions

  1. Linearity: The relationship between the current value and past values is linear.
  2. Normal independent and identically distributed (i.i.d.) errors: The error terms ete_t are normally distributed with mean zero and constant variance. (i.e. no autocorrelation)
  3. Additive errors: The error terms are added to the linear combination of past values.
  4. Stationarity: The time series is stationary, meaning its statistical properties do not change over time. ---> most important assumption!!

Backward Shift Operator

The backward shift operator BB is defined as:

BXt=Xt1B2Xt=B(BXt)=BXt1=Xt2BpXt=XtpB X_t = X_{t-1} \\ B^2 X_t = B(B X_t) = B X_{t-1} = X_{t-2} \\ \ldots \\ B^p X_t = X_{t-p}

then the AR(p) model can be written as:

Xt=a0+a1BXt+a2B2Xt++apBpXt+etXta1BXta2B2XtapBpXt=a0+etϕ(B)Xt=a0+etX_t = a_0 + a_1 B X_t + a_2 B^2 X_t + \ldots + a_p B^p X_t + e_t \\ \Rightarrow X_t - a_1 B X_t - a_2 B^2 X_t - \ldots - a_p B^p X_t = a_0 + e_t \\ \Rightarrow \phi(B) X_t = a_0 + e_t

where ϕ(B)=1a1Ba2B2apBp\phi(B) = 1 - a_1 B - a_2 B^2 - \ldots - a_p B^p is the characteristic polynomial of the AR(p) model.

How to estimate coefficients

Estimated by minimizing the sum of squared errors (SSE) using multiple linear regression.

SSE=et2=(XtX^t)2=(Xta0a1Xt1a2Xt2apXtp)2SSE = \sum e_t^2 = \sum (X_t - \hat{X}_t)^2 = \sum (X_t - a_0 - a_1 X_{t-1} - a_2 X_{t-2} - \ldots - a_p X_{t-p})^2

Minimize SSE to find the optimal coefficients a0,a1,,apa_0, a_1, \ldots, a_p. This is a multiple linear regression problem.

Determining the order of the AR model

By using the partial autocorrelation function (PACF).

Note

PACF is the autocorrelation between XtX_t and XtlX_{t-l} after removing the effects of the intermediate lags 1,2,,l11, 2, \ldots, l-1. It is the conditional correlation between XtX_t and XtlX_{t-l} given the values of Xt1,Xt2,,Xtl+1X_{t-1}, X_{t-2}, \ldots, X_{t-l+1}.

Example:

The first order partial autocorrelation will defined to equal the first order autocorrelation.

The second order (lag) partial autocorrelation is:

Cov(Xt,Xt2Xt1)Var(XtXt1)Var(Xt2Xt1)\frac{Cov(X_t, X_{t-2} | X_{t-1})}{\sqrt{Var(X_t | X_{t-1}) Var(X_{t-2} | X_{t-1})}}

where X^t=a0+a1Xt1\hat{X}_t = a_0 + a_1 X_{t-1} and X^t2=a0+a1Xt3\hat{X}_{t-2} = a_0 + a_1 X_{t-3} are the linear regression estimates of XtX_t and Xt2X_{t-2} on Xt1X_{t-1} and Xt3X_{t-3} respectively.

The third order (lag) partial autocorrelation is:

Cov(Xt,Xt3Xt1,Xt2)Var(XtXt1,Xt2)Var(Xt3Xt1,Xt2)\frac{Cov(X_t, X_{t-3} | X_{t-1}, X_{t-2})}{\sqrt{Var(X_t | X_{t-1}, X_{t-2}) Var(X_{t-3} | X_{t-1}, X_{t-2})}}

where X^t=a0+a1Xt1\hat{X}_t = a_0 + a_1 X_{t-1} and X^t3=a0+a1Xt4\hat{X}_{t-3} = a_0 + a_1 X_{t-4} are the linear regression estimates of XtX_t and Xt3X_{t-3} on Xt1X_{t-1} and Xt4X_{t-4} respectively.

Moving Average (MA) Model

An MA model expresses the variable as a linear combination of past error terms.

For MA model of order qq (MA(q)):

X^t=b0+b1et1+b2et2++bqetq\hat{X}_t = b_0 + b_1 e_{t-1} + b_2 e_{t-2} + \ldots + b_q e_{t-q}

For order 1 (MA(1)):

X^t=b0+b1et1\hat{X}_t = b_0 + b_1 e_{t-1}

MA(0) = AR(0) = white noise = ete_t = Xta0X_t - a_0

Where, a0a_0 is the mean of the time series.

Backward shift operator B

Using the backward shift operator BB, the MA(q) model can be written as:

Xta0=b1Bet+b2B2et++bqBqet+etXta0=(1+b1B+b2B2++bqBq)etXta0=ψ(B)etX_t - a_0 = b_1 B e_t + b_2 B^2 e_t + \ldots + b_q B^q e_t + e_t \\ \Rightarrow X_t - a_0 = (1 + b_1 B + b_2 B^2 + \ldots + b_q B^q) e_t \\ \Rightarrow X_t - a_0 = \psi(B) e_t \\

Autocorrelation

For MA(1):

ρ(l)={1l=0b11+b12l=10l>1\rho(l) = \begin{cases} 1 & l = 0 \\ \frac{b_1}{1 + b_1^2} & l = 1 \\ 0 & l > 1 \end{cases}

The autocorrelation of MA(q) is non-zero only for lags up to qq.

Determining the order of the MA model

The order of the last significant autocorrelation ρ(l)\rho(l) determines the order of the MA model.

Integrated (I) Model

Used for non-stationary time series data to make it stationary by differencing.

White noise: Xta0=etX_t - a_0 = e_t, but the mean is independent of time.

If the time series is parabolic, the second difference can be modeled as white noise (I(2)):

Two levels of differencing.

(XtXt1)(Xt1Xt2)=a0+et(1B)2Xt=a0+et(X_t - X_{t-1}) - (X_{t-1} - X_{t-2}) = a_0 + e_t \\ (1 - B)^2 X_t = a_0 + e_t

ARMA and ARIMA Models

It is possible to combine AR, MA, and I components into a single model.

ARMA(p, q) Model

Combines AR(p) and MA(q) models for stationary time series:

ϕ(B)Xt=a0+ψ(B)et\phi(B) X_t = a_0 + \psi(B) e_t

Where:

  • ϕ(B)=1a1Ba2B2apBp\phi(B) = 1 - a_1 B - a_2 B^2 - \ldots - a_p B^p (AR part)
  • ψ(B)=1+b1B+b2B2++bqBq\psi(B) = 1 + b_1 B + b_2 B^2 + \ldots + b_q B^q (MA part)

ARIMA(p, d, q) Model

For non-stationary time series, we can use differencing to make it stationary and then apply ARMA:

(1B)dXt=Ytϕ(B)Yt=a0+ψ(B)etϕ(B)(1B)dXt=a0+ψ(B)et(1 - B)^d X_t = Y_t \\ \phi(B) Y_t = a_0 + \psi(B) e_t \\ \Rightarrow \phi(B) (1 - B)^d X_t = a_0 + \psi(B) e_t

Where:

  • dd: order of differencing