A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. [Source: Wikipedia]

Time series analysis can be useful to identify and separate trends, cycles and seasonality of a dataset. In this blog I will illustrate how I analyse time series datasets in R and also show methods to forecast the data.

Let’s look at the Monthly Milk Production dataset from datamarket.com. The time series dataset measures pounds per cow as its unit per month from January 1962 to December 1975.

The dataset can be downloaded from this link. Once we download the CSV file and place it in the working directory, we can read the file using the following code.

> library(forecast) > milk <- read.csv("monthly-milk-production-pounds-p.csv")

Now we can transform the input to a time series by giving the frequency and start month/year. We can also print out the time series as seen below.

> milk.ts <- ts(milk, frequency=12, start=c(1962,1)) > milk.ts Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1962 561 640 656 727 697 640 599 568 577 553 582 600 1963 566 653 673 742 716 660 617 583 587 565 598 628 1964 618 688 705 770 736 678 639 604 611 594 634 658 1965 622 709 722 782 756 702 653 615 621 602 635 677 1966 635 736 755 811 798 735 697 661 667 645 688 713 1967 667 762 784 837 817 767 722 681 687 660 698 717 1968 696 775 796 858 826 783 740 701 706 677 711 734 1969 690 785 805 871 845 801 764 725 723 690 734 750 1970 707 807 824 886 859 819 783 740 747 711 751 804 1971 756 860 878 942 913 869 834 790 800 763 800 826 1972 799 890 900 961 935 894 855 809 810 766 805 821 1973 773 883 898 957 924 881 837 784 791 760 802 828 1974 778 889 902 969 947 908 867 815 812 773 813 834 1975 782 892 903 966 937 896 858 817 827 797 843

Let’s plot out the time series.

> plot.ts(milk.ts)

We can see that there is a seasonal variation in the time series within a year. Also it seems to be like an additive model as the seasonal fluctuations are roughly constant over time. If the seasonal fluctuations seem to increase in the level of time series then it can denote a multiplicative model (which is not our case here).

Let’s now decompose the series into its constituents which are a trend component, an irregular component and a seasonal component (if present). The irregular component is the remainder of the time series once decomposed.

> plot(decompose(milk.ts))

If we have a time series using an additive model with an increasing or decreasing trend and seasonality, we can use Holt-Winters exponential smoothing to make short-term forecasts. To fit a predictive model for the log of the monthly milk production we write the following code.

> logmilk.ts <- log(milk.ts) > milk.ts.forecast <- HoltWinters(log(milk.ts)) > milk.ts.forecast Holt-Winters exponential smoothing with trend and additive seasonal component. Call: HoltWinters(x = log(milk.ts)) Smoothing parameters: alpha: 0.587315 beta : 0 gamma: 1 Coefficients: [,1] a 6.788338238 b 0.002087539 s1 -0.031338300 s2 -0.090489288 s3 0.043463485 s4 0.058024146 s5 0.127891508 s6 0.100852168 s7 0.057336548 s8 0.011032882 s9 -0.047972067 s10 -0.047584275 s11 -0.097105193 s12 -0.051371280 > plot(milk.ts.forecast)

As for simple exponential smoothing and Holt’s exponential smoothing, we can plot the original time series as a black line, with the forecasted values as a red line on top of that.

To make forecasts for future times not included in the original time series, we use the “forecast()” function in the “forecast” package.

> milk.ts.forecast2 <- forecast(milk.ts.forecast, h=48) > plot(milk.ts.forecast2)

The forecasts are shown in the dark blue line and the grey areas are the confidence intervals.

References: http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html

https://www.statmethods.net/advstats/timeseries.html