Title: | 'M5 Forecasting' Challenges Data |
---|---|
Description: | Contains functions, which facilitate downloading, loading and preparing data from 'M5 Forecasting' challenges (by 'University of Nicosia', hosted on 'Kaggle'). The data itself is set of time series of different product sales in 'Walmart'. The package also includes a ready-to-use built-in M5 subset named 'tiny_m5'. For detailed information about the challenges, see: Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilis. (2020). The M5 Accuracy competition: Results, findings and conclusions. <doi:10.1016/j.ijforecast.2021.10.009> |
Authors: | Krzysztof Joachimiak [aut, cre]
|
Maintainer: | Krzysztof Joachimiak <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-01-29 02:53:07 UTC |
Source: | https://github.com/krzjoa/m5 |
Each time series in the dataset can be assigned one of the following classes:
m5_demand_type(data)
m5_demand_type(data)
data |
The result of the |
Smooth (ADI < 1.32 and CV² < 0.49).
Intermittent (ADI >= 1.32 and CV² < 0.49)
Erratic (ADI < 1.32 and CV² >= 0.49)
Lumpy (ADI >= 1.32 and CV² >= 0.49)
A data.table
containing item ids (item_id
and store_id
),
ADI and CV2 scores (adi
and cv2
respectively) as well as the final
class chosen based on the aforementioned scores (demand_type
).
Syntetos A. A. and Boylan J. E., 2005, The accuracy of intermittent demand estimates. International Journal of Forecasting 21: 303–314 Forecast Error Measures: Intermittent Demand
head(m5_demand_type(tiny_m5))
head(m5_demand_type(tiny_m5))
Download and unzip the raw data to the specified directory
m5_download(path, unzip = TRUE)
m5_download(path, unzip = TRUE)
path |
A directory name to save the zip file |
unzip |
Automatically uznip the file when the downloading is finished.
Default: TRUE. The |
Returns nothing; the result of the function call is the m5.zip
file
downloading and extraction.
If you struggle with timeout problems, please increase the timeout value
using options(timeout=<new_timeout_value>)
## Not run: m5_download('data') ## End(Not run)
## Not run: m5_download('data') ## End(Not run)
data.table::fread()
functionLoad raw CSV files using data.table::fread()
function
m5_get_raw_evaluation(path) m5_get_raw_validation(path)
m5_get_raw_evaluation(path) m5_get_raw_validation(path)
path |
The directory with the unzipped M5 data files |
The function returns a list of five data.tables:
sales_train (evaluation/validation)
sales_test (evaluation/validation)
sell_prices
calendar
weights (evaluation/validation)
## Not run: library(m5) library(zeallot) m5_download('data') c(sales_train, sales_test, sell_prices, calendar, ) %<-% m5_get_raw_evaluation('data') ## End(Not run)
## Not run: library(m5) library(zeallot) m5_download('data') c(sales_train, sales_test, sell_prices, calendar, ) %<-% m5_get_raw_evaluation('data') ## End(Not run)
It's a memory-efficient function, which uses data.table
under the hood.
However, it still not recommended to use this function on PCs with < 16GB RAM.
In such case, consider to use a custom solution
based on [arrow](https://arrow.apache.org/docs/r/)
or [disk.frame](https://diskframe.com/index.html)
m5_prepare(sales_train, sales_test, calendar, sell_prices)
m5_prepare(sales_train, sales_test, calendar, sell_prices)
sales_train |
A data.frame with M5 train data |
sales_test |
A data.frame with M5 test data |
calendar |
A data.frame with M5 calendar |
sell_prices |
A data.frame with M5 sell_prices |
A data.table
composed from input objects, which contains the following columns:
item_id
dept_id
cat_id
store_id
state_id
d - day ordinal number
value - number of sold items
wm_yr_wk - week identifier
weekday - weekday name (character)
wday - weekday as an integer
month
year
event_name_1 - special event name, like holidays etc.
event_type_1 - special event type
event_name_2 - as above
event_type_2 - as above
snap - promotion flag
sell_price
library(m5) library(zeallot) ## Not run: m5_download('data') c(sales_train, sales_test, sell_prices, calendar, weights) %<-% m5_get_raw_evaluation('data') m5_data <- m5_prepare(sales_train, sales_test, calendar, sell_prices) ## End(Not run)
library(m5) library(zeallot) ## Not run: m5_download('data') c(sales_train, sales_test, sell_prices, calendar, weights) %<-% m5_get_raw_evaluation('data') m5_data <- m5_prepare(sales_train, sales_test, calendar, sell_prices) ## End(Not run)
A piece of data cut from the training dataset used in the M5 challenges on Kaggle. M5 is a challenge from a series organized by Spyros Makridakis.
tiny_m5
tiny_m5
The id of the product
The id of the department the product belongs to
The id of the category the product belongs to
The id of the store where the product is sold
The State where the store is located
The number of sold units
The date in a “y-m-d” format
The id of the week the date belongs to
The type of the day (Saturday, Sunday, …, Friday)
The id of the weekday, starting from Saturday
The month of the date
The year of the date
If the date includes an event, the name of this event
If the date includes an event, the type of this event
If the date includes a second event, the name of this event
If the date includes a second event, the type of this event
A binary variable (0 or 1) indicating whether the stores of CA, TX or WI allow SNAP1 purchases on the examined date. 1 indicates that SNAP purchases are allowed
The price of the product for the given week/store. The price is provided per week (average across seven days). If not available, this means that the product was not sold during the examined week. Note that although prices are constant at weekly basis, they may change through time (both training and test set)
The M5 competition: Background, organization, and implementation
Other Walmart datasets in timetk
library(m5) # Head of tiny_m5 head(tiny_m5)
library(m5) # Head of tiny_m5 head(tiny_m5)