Retail Price Time Series Imputation

Date
2014-06
Authors
Malik, Obaid Ullah
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

A regular, discrete time series is an ordered sequence of coarse-grained observations taken at fixed time intervals. Here we consider regular, discrete, retail price time series datasets acquired through crowdsourcing. Crowdsourcing is a means of data collection whereby independent individuals push publicly-available information to an information consolidator who then distributes it back to the individuals for their collective mutual benefit. Crowdsourced datasets are typically incomplete due to missing observations and missing values for important attributes. In this thesis, we consider the problem of filling in missing values in crowdsourced, regular, discrete, retail price time series datasets using data imputation methods. We introduce a new method called Retail Price Time Series Imputation (RPTSI). The basic RPTSI method uses an ensemble of three constituent methods for imputing retail prices in a univariate time series dataset based upon retail prices that already occur in the time series: namely, Price Change Lookup, Central Moving Average, and Polynomial Interpolation. We also introduce four other methods that extend the basic RPSTI method by considering retail prices for similar products sold by the retailer and similar products sold by competitors: namely, RPTSIC, RPTSI-CR, RPTSI-P, and RPTSI-PCR. The RPTSI-C method finds relationships between a retailer and competitor in terms of those days where they have similar price changes. The RPTSI-CR method finds the relationship between a retailer and a competitor in terms of those days where they have similar price change categories. The RPTSI-P method finds relationships among similar products sold by the retailer. The RPTSI-PCR method integrates the RPTSI-P and the RPTSI-CR methods to find relationships in terms of similar product prices of the retailer while using prices obtained from competitors. A crowdsourced dataset containing retail prices from retailers in four North American cities was used in a series of experiments to evaluate the five RPTSI methods. The results obtained were compared against those obtained using Multiple Imputation, Last Value Carried Forward, Mean Imputation, Moving Average, and Polynomial Interpolation. Mean Absolute Deviation (MAD) was used to measure the accuracy of the data filling methods. The RPTSI-CR and RPTSI-PCR methods outperformed Multiple Imputation and other aforementioned methods having the lowest MAD for univariate and multivariate time series, respectively.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. viii, 94 p.
Keywords
Citation
Collections