Package 'm5' reference manual

Title:	'M5 Forecasting' Challenges Data
Description:	Contains functions, which facilitate downloading, loading and preparing data from 'M5 Forecasting' challenges (by 'University of Nicosia', hosted on 'Kaggle'). The data itself is set of time series of different product sales in 'Walmart'. The package also includes a ready-to-use built-in M5 subset named 'tiny_m5'. For detailed information about the challenges, see: Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilis. (2020). The M5 Accuracy competition: Results, findings and conclusions. <doi:10.1016/j.ijforecast.2021.10.009>
Authors:	Krzysztof Joachimiak [aut, cre]
Maintainer:	Krzysztof Joachimiak <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2025-03-30 03:10:47 UTC
Source:	https://github.com/krzjoa/m5

Classify time series of the particular items

Description

Each time series in the dataset can be assigned one of the following classes:

Usage

m5_demand_type(data)
m5_demand_type(data)

Arguments

data

The result of the m5_prepare function; tiny_m5 can be passed as well.

Details

Smooth (ADI < 1.32 and CV² < 0.49).
Intermittent (ADI >= 1.32 and CV² < 0.49)
Erratic (ADI < 1.32 and CV² >= 0.49)
Lumpy (ADI >= 1.32 and CV² >= 0.49)

Value

A data.table containing item ids (item_id and store_id), ADI and CV2 scores (adi and cv2 respectively) as well as the final class chosen based on the aforementioned scores (demand_type).

References

Syntetos A. A. and Boylan J. E., 2005, The accuracy of intermittent demand estimates. International Journal of Forecasting 21: 303–314 Forecast Error Measures: Intermittent Demand

Examples

head(m5_demand_type(tiny_m5))

head(m5_demand_type(tiny_m5))

Download and unzip the raw data to the specified directory

Description

Download and unzip the raw data to the specified directory

Usage

m5_download(path, unzip = TRUE)
m5_download(path, unzip = TRUE)

Arguments

`path`	A directory name to save the zip file
`unzip`	Automatically uznip the file when the downloading is finished. Default: TRUE. The `exdir` argument in the `unzip` function is the directory name the file was downloaded into.

Value

Returns nothing; the result of the function call is the m5.zip file downloading and extraction.

Note

If you struggle with timeout problems, please increase the timeout value using ⁠options(timeout=<new_timeout_value>)⁠

References

Examples

## Not run: 
m5_download('data')

## End(Not run)

## Not run: 
m5_download('data')

## End(Not run)

Load raw CSV files using `data.table::fread()` function

Description

Load raw CSV files using data.table::fread() function

Usage

m5_get_raw_evaluation(path)

m5_get_raw_validation(path)
m5_get_raw_evaluation(path)

m5_get_raw_validation(path)

Arguments

path

The directory with the unzipped M5 data files

Value

The function returns a list of five data.tables:

sales_train (evaluation/validation)
sales_test (evaluation/validation)
sell_prices
calendar
weights (evaluation/validation)

References

m5-forecasts repo by Nixtla

Examples

## Not run: 
library(m5)
library(zeallot)

m5_download('data')
c(sales_train,
  sales_test,
  sell_prices,
  calendar,
  ) %<-% m5_get_raw_evaluation('data')

## End(Not run)
## Not run: 
library(m5)
library(zeallot)

m5_download('data')
c(sales_train,
  sales_test,
  sell_prices,
  calendar,
  ) %<-% m5_get_raw_evaluation('data')

## End(Not run)

Prepare the ready-to-use M5 data in one data.frame

Description

It's a memory-efficient function, which uses data.table under the hood. However, it still not recommended to use this function on PCs with < 16GB RAM. In such case, consider to use a custom solution based on ⁠[arrow](https://arrow.apache.org/docs/r/)⁠ or ⁠[disk.frame](https://diskframe.com/index.html)⁠

Usage

m5_prepare(sales_train, sales_test, calendar, sell_prices)
m5_prepare(sales_train, sales_test, calendar, sell_prices)

Arguments

`sales_train`	A data.frame with M5 train data
`sales_test`	A data.frame with M5 test data
`calendar`	A data.frame with M5 calendar
`sell_prices`	A data.frame with M5 sell_prices

Value

A data.table composed from input objects, which contains the following columns:

item_id
dept_id
cat_id
store_id
state_id
d - day ordinal number
value - number of sold items
wm_yr_wk - week identifier
weekday - weekday name (character)
wday - weekday as an integer
month
year
event_name_1 - special event name, like holidays etc.
event_type_1 - special event type
event_name_2 - as above
event_type_2 - as above
snap - promotion flag
sell_price

Examples

library(m5)
library(zeallot)
## Not run: 

m5_download('data')
c(sales_train,
  sales_test,
  sell_prices,
  calendar,
  weights) %<-% m5_get_raw_evaluation('data')

m5_data  <-
   m5_prepare(sales_train, sales_test, calendar, sell_prices)

## End(Not run)

library(m5)
library(zeallot)
## Not run: 

m5_download('data')
c(sales_train,
  sales_test,
  sell_prices,
  calendar,
  weights) %<-% m5_get_raw_evaluation('data')

m5_data  <-
   m5_prepare(sales_train, sales_test, calendar, sell_prices)

## End(Not run)

A subset from M5 Walmart Challenge Dataset in one data frame

Description

A piece of data cut from the training dataset used in the M5 challenges on Kaggle. M5 is a challenge from a series organized by Spyros Makridakis.

Usage

tiny_m5
tiny_m5

Format

item_id: The id of the product
dept_id: The id of the department the product belongs to
cat_id: The id of the category the product belongs to
store_id: The id of the store where the product is sold
state_id: The State where the store is located
value: The number of sold units
date: The date in a “y-m-d” format
wm_yr_wk: The id of the week the date belongs to
weekday: The type of the day (Saturday, Sunday, …, Friday)
wday: The id of the weekday, starting from Saturday
month: The month of the date
year: The year of the date
event_name_1: If the date includes an event, the name of this event
event_type_1: If the date includes an event, the type of this event
event_name_2: If the date includes a second event, the name of this event
event_type_2: If the date includes a second event, the type of this event
snap: A binary variable (0 or 1) indicating whether the stores of CA, TX or WI allow SNAP1 purchases on the examined date. 1 indicates that SNAP purchases are allowed
sell_price: The price of the product for the given week/store. The price is provided per week (average across seven days). If not available, this means that the product was not sold during the examined week. Note that although prices are constant at weekly basis, they may change through time (both training and test set)

Examples

library(m5)
# Head of tiny_m5
head(tiny_m5)
library(m5)
# Head of tiny_m5
head(tiny_m5)

Package 'm5'

Help Index

Classify time series of the particular items

Description

Usage

Arguments

Details

Value

References

Examples

Download and unzip the raw data to the specified directory

Description

Usage

Arguments

Value

Note

References

Examples

Load raw CSV files using `data.table::fread()` function

Description

Usage

Arguments

Value

References

Examples

Prepare the ready-to-use M5 data in one data.frame

Description

Usage

Arguments

Value

Examples

A subset from M5 Walmart Challenge Dataset in one data frame

Description

Usage

Format

See Also

Examples

Package 'm5'

Help Index

Classify time series of the particular items

Description

Usage

Arguments

Details

Value

References

Examples

Download and unzip the raw data to the specified directory

Description

Usage

Arguments

Value

Note

References

Examples

Load raw CSV files using data.table::fread() function

Description

Usage

Arguments

Value

References

Examples

Prepare the ready-to-use M5 data in one data.frame

Description

Usage

Arguments

Value

Examples

A subset from M5 Walmart Challenge Dataset in one data frame

Description

Usage

Format

See Also

Examples

Load raw CSV files using `data.table::fread()` function