Seasonality Profile |
The 'seasonality profile' is a set of cumulative distribution functions (CDFs), one for every hour of every day of the year, that Windographer generates to characterize the seasonal and diurnal patterns in a time series.
The seasonality profile indicates the range of observed values in each hour of the year. For a 20-year time series of temperature measurements, for example, if the seasonality profile's CDF for 3-4pm on August 2 contains values from 17.3°C to 36.8°C, with a median of 29.9°C, that would mean that in the 3-4pm time interval on August 2, the lowest temperature observed over those 20 years was 17.3°C, the highest was 36.8°C, and half the observed values fell below 29.9°C.
For each day of the year and each hour of the day, Windographer builds a cumulative distribution function (CDF) from all valid data within a window of time centered on that hour of that day. To build the CDF for 10-11pm Jan 11, for example, Windographer will use all valid data points occurring in the 10-11pm time frame, from one week before to one week after Jan 11, in all years covered by the dataset. If that window yields too few valid data points, the window expands, first in the day-of-year dimension and then in the hour-of-day dimension, until it encompasses at least 50 valid data points.
The diagram below illustrates this process. In the first stage, the window covers 10pm-11pm, from Jan 4 to Jan 18. For a dataset with 10-minute time steps and perfect data recovery, that stage 1 window covering a one-hour segment of fifteen days would encompass 6 * 15 = 90 data points, which is more than the required 50, so the window would not need to grow any larger.
With imperfect data recovery though, that first stage may harvest fewer than 50 valid data points, so the process would continue. In stage 2 the window expands to plus or minus two weeks, meaning Dec 29 to Jan 25. In stage 3 the window expands to plus or minus 3 weeks, and so on until the window covers all days of the year, between 10pm and 11pm. If that still yields too few data points, the window begins to expand in the hours-of-day dimension, first so that it covers 9pm to 12pm, then 8pm to 1am, and so on until it yields 50 valid data points, or it expands to include all data, whichever comes first.
Once the seasonality profile has been generated for a particular time series, we can use it to remove the seasonality from that time series, resulting in a new 'seasonality-normalized time series' having no seasonal or diurnal pattern. Windographer does this by stepping through the time series, in each time step referring to the relevant CDF in the seasonality profile, and looking up on that CDF the percentile value corresponding to the measured value in that time step. This yields a time series that we can call 'seasonality-normalized' because we have removed its diurnal and seasonal patterns.
This new 'seasonality-normalized time series' is in units of percent, and it signifies the relative value of each of the original observations, relative to the other observations made in and around that time of day and that time of year. For example, if we started with a temperature time series, removed its seasonality profile, and the resulting seasonality-normalized time series contained a value of 85% at noon on January 1st, 2018, that means that if you looked at all the temperatures recorded in that original time series in and around noon of the first day of each year, the temperature at noon on January 1st, 2018 would be in the 85th percentile of that set of numbers. The weatherman might say "it is well above average for noon at this time of year".
To transform a 'seasonality-normalized time series' back to physical units, Windographer steps through the time series and in each time step, refers to the relevant CDF in the seasonality profile, and from it calculates the physical value that corresponds to the percentile value in that time step. This is exactly the same process as taken to factor out the seasonality profile, in the other direction so that we are looking up the observed values corresponding to particular percentile values, rather than the vice-versa.
Windographer generates the seasonality profile as part of the Markov-based reconstruction mechanism. Once the seasonality profile is factored out, as described above, to generate the 'seasonality-normalized time series', the only distinguishing characteristic remaining in the time series is its autocorrelation, or more precisely, its Markov transition behavior. As a result, a Markov analysis can very effectively characterize the behavior of the time series, and a Markov chain approach can synthesize artificial data that closely matches the behavior of the real data. Then once the seasonality profile is factored back in, that synthetic data will display realistic diurnal and seasonal patterns.
See also
Markov-based reconstruction mechanism