Employment Forecasting – TipTopTechknowlogy

Forecasting FRED Data on Employment in the Financial Activities Sector

David Pershall

Financial Activities

Every month the U.S. Bureau of Labor Statistics publishes employment figures for large sections of the U.S. Economy. The task is to forecast the amount of people employed in the financial activities sector for four years. This information is publicly reported by the St. Louis FRED and can be found at the following link. https://fred.stlouisfed.org/series/USFIRE.

Four years represents a large amount of time and confidence windows will be wide. Therefore, I will test a few forecasting models by comparing their respective RMSE and Winkler scores on a known portion of the time series.

Options, Libraries, and Pull

#Libraries
if(!require(pacman)) 
  install.packages("pacman", repos = "http://cran.us.r-project.org")

pacman::p_load("tidyverse", "fpp3", "fable.prophet", "ggpubr")

#Pull
tmp <- tempfile()
download.file("https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=1169&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=CEU5500000001&scale=left&cosd=2000-01-01&coed=2023-01-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2023-02-20&revision_date=2023-02-20&nd=2000-01-01", tmp)

Transform

Now that we have the data pulled and stored, we need to transform it into a a workable time series structure.

actual <- read.csv(tmp, col.names = c("Date", "Employed")) %>% 
  mutate(Date = as.Date(Date),
         # Data provides the number of people in Thousands of persons so let's
         # adjust that to millions for easier reading
         Employed = Employed/1000,
         Month = yearmonth(Date)) %>% 
  dplyr::select(Month, Employed) %>% as_tsibble(index = Month)

Exploration

autoplot(actual, Employed) +
  labs(title = "US employment: Financial Activities",
       y = "Number of people (millions)")

A few observations from this simple plot are that the data is non-stationary and appears to have some seasonality. We can also observe the 2008-10 recession as well as a sharp slump due to the outbreak of the Covid-19 pandemic in 2020.

Seasonality

Let’s take a closer look at the seasonality by plotting it.

actual %>% gg_season(Employed, period = "year")

We can see some trends forming in the seasonal chart above. On average, the months of May and June tend to see a rise in the number of people joining the financial activities workforce, followed by a trimming in August. We also see the sharp declines caused by the 2020 Covid-19 pandemic and the 2008-10 financial crisis. These stand as the exception to the rule of generalized growth in the rate of people employed in financial activities.

The Split

The first task is to split the data so that we have a training set and a test set to perform the accuracy tests against.

train <- actual %>% 
  filter(year(Month) <= 2018)
test <- actual %>% 
  filter(year(Month) >= 2019)
rm(actual)
gc()

##           used (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 1331188 71.1    2201864 117.6  2201864 117.6
## Vcells 2325831 17.8    8388608  64.0  3871675  29.6

Differencing

Since the data is non stationary, let’s go ahead and difference the data and run a preliminary ACF and PACF analysis.

train %>% 
  gg_tsdisplay(difference(Employed, 12),
               plot_type = "partial", lag = 50) +
  labs(title = "Seasonally differenced", y = "")

In the above seasonally differenced graph we see too many lines pass the 20% correlation cut-off in the ACF and PACF plots. This suggests we should difference the data again.

train %>% 
  gg_tsdisplay(difference(difference(Employed,12)),
               plot_type = "partial", lag = 50) +
  labs(title = "Double differenced",
       y = "")

In the double differenced graph we see that the series more closely resembles white noise, and could be ready for the model fitting process.

The fits

The goal is to find the most accurate model for forecasting this series based on minimizing the RMSE and Winkler score. Since there are considerable spikes remaining in the double differenced ACF and PACF plots, I will conduct a search for the best ARIMA model and the best STLF model. This will take more time than constructing one myself. However, I am only working with one series, so the time spent will be worth while. I will also fit a Seasonal Naive model, an ETS model, a Prophet Model, and one combination model.

STLF = decomposition_model(STL(Employed ~ season(window = Inf)),
                               ETS(season_adjust ~ season("N")))
fits <- train %>% 
  model(
    stlf = STLF,
    ets = ETS(Employed),
    seasonal_naive_drift = SNAIVE(Employed ~ drift()),
    prophet = prophet(Employed ~ season(period = 12, order = 2,
                                      type = "multiplicative")),
    arima_search = ARIMA(Employed, greedy = F, stepwise = FALSE, approx = FALSE),
    ) %>% mutate(combination = (ets + stlf + arima_search + seasonal_naive_drift) / 4)
fits

## # A mable: 1 x 6
##                        stlf           ets seasonal_naive_drift   prophet
##                     <model>       <model>              <model>   <model>
## 1 <STL decomposition model> <ETS(M,Ad,M)>    <SNAIVE w/ drift> <prophet>
## # ℹ 2 more variables: arima_search <model>, combination <model>

Preliminary Plots

We can go ahead and plot the forecast means as a preliminary examination of the models.

Employment_forecasts <- fits %>% 
  forecast(h = "4 years")

Employment_forecasts %>% 
  autoplot(train,
           level = NULL) +
  labs(y = "Number of People (Millions)",
       title = "US employment: Financial Activities")

The residuals

Let’s dig a little deeper in our exploration of the models. One way of comparing and contrasting models is to plot their residuals. Since we are more focused on RMSE and the Wilder score, I will only include the Arima search model and the Prophet models as an example of this approach.

fits %>% dplyr::select(arima_search) %>% gg_tsresiduals(lag = 56) + labs(title = "Arima search", y="")

We can see that the Arima search model does good job of capturing the information available in the data. The ACF plot of the Arima search model resembles that of white noise, and the distribution of the residuals is close to a normal distribution.

fits %>% dplyr::select(prophet) %>% gg_tsresiduals(lag = 56) + labs(title = "Prophet", y="")

The Prophet model above does not do as good a job of capturing the information available in the data. There are still too many spikes in the ACF plot and the distribution shows abnormalities. We can safely assume the Arima model will perform better than the prophet model.

RMSE

We can calculate the RMSE of the models against the test set to see which one performs the best. If we were correct with the examination of the residual plots above we can expect the arima model to perform much better than the prophet model.

Employment_forecasts %>% 
  accuracy(test) %>% dplyr::select(.model, RMSE) %>% 
  arrange(RMSE)

## # A tibble: 6 × 2
##   .model                RMSE
##   <chr>                <dbl>
## 1 combination          0.107
## 2 ets                  0.118
## 3 arima_search         0.137
## 4 stlf                 0.149
## 5 seasonal_naive_drift 0.169
## 6 prophet              0.201

We were correct with our examination of the earlier plots. The Arima search model does much better than the prophet model. The combination model has the lowest RMSE making it the best approach. While this is a simple linear combination, there is much work being done combining forecasting models using different methods, and the combination models almost always outperform a singular model by itself. Notably, the very popular prophet model has the highest RMSE making it even less accurate than the seasonal naive with drift model.

Winkler Score

Next up, I will generate 5000 future sample paths and their distributions in order to match the distributions already stored in the construction of the prophet model. This will allow me check for accuracy using a Winkler test. Just like RMSE, the lower the Winkler score the better.

Employment_futures <- fits %>% dplyr::select(c("ets", "stlf", "arima_search",
                                              "seasonal_naive_drift", 
                                              "combination")) %>% 
  # Generate 5000 future sample paths
  generate(h = "4 years", times = 5000) %>% 
  # Compute forecast distributions from future sample paths
  as_tibble() %>% 
  group_by(Month, .model) %>% 
  summarise(dist = distributional::dist_sample(list(.sim))) %>% 
  ungroup() %>% 
  # Create fable object
  as_fable(index = Month, key = .model,
           distribution = dist, response = "Employed")


# Match the prophet 5000 future sample paths and join with Employment Futures
prophet_futures <- Employment_forecasts %>% 
  filter(.model=="prophet") %>% 
  dplyr::select(.model, Month, Employed) %>% 
  `colnames<-`(c(".model","Month","dist")) %>% 
  as_fable(index = Month, key = .model, distribution = dist, 
           response = "Employed")
  
Employment_futures <- Employment_futures %>% 
  full_join(prophet_futures,
            by = join_by(Month,.model,dist))

# Winkler test
Employment_futures %>% 
  accuracy(test, measures = interval_accuracy_measures,
           level = 95) %>% 
  arrange(winkler)

## # A tibble: 6 × 3
##   .model               .type winkler
##   <chr>                <chr>   <dbl>
## 1 combination          Test    0.464
## 2 ets                  Test    0.696
## 3 seasonal_naive_drift Test    0.841
## 4 stlf                 Test    0.917
## 5 arima_search         Test    0.979
## 6 prophet              Test    1.08

Final Model

The combination model is the most accurate according to RMSE and and the Winkler test. Let’s visualize how the model performs with a simple plot.

forecast(fits, h= "4 years") %>% 
  filter(.model=='combination') %>% 
  autoplot(train) +
  autolayer(test, Employed, color = "red") +
  labs(title = "US employment: Financial Activities",
       y="Number of people (millions)",
       x = "Year/Month")

Summary

A couple of observations jump out right away from this investigative journey. The final combination model is extremely accurate within the first year. This is followed by the 2020 decline. No statistical model could have possibly foreseen the Covid-19 pandemic, nor the economic challenges it wrought. If I was responsible for producing short term forecasts of month to month changes during 2020, I would have been forced to use some form of scenario based forecasting which would have included some generalized economic data. I would have also considered assembling a panel of experts to help produce a Delphi forecast.

When we move past the first two years, the shaded prediction intervals in all models are extremely wide. Therefore, it would make more sense to forecast this particular series two years at a time instead of four. A well trained long term forecast of 4 to 5 years, however, is still quite useful for evaluating the realm of possibilities the future may bring.