Finding Correlations Between Ranges of Products

In the U.K. in 2017, the food-to-go (FTG) sector was worth up to £ 17.4bn and it is forecasted to increase to £23.5bn in 2022 as consumers’ need of eating on-the-go does not shows signs of slowing down (IGD, 2017). This growth in the market represents an opportunity for FTG retailers and suppliers, because as the market grows, they have a chance of growing their market share and increase profits.

In order to grow in this dynamic environment, retailers must manage their products as effectively as possible. In recent years, researchers started to recommend category management (CM) as an effective way of managing thousands of retail items. CM is a management technique that aims at improving a retailer’s overall performance by dividing stock in small categories and treating them as independent entities (Musalem and Aburto, 2017). Categories are created by grouping highly correlated items together which are then considered to be substitutes or complement of each other (Gooner et al., 2011). This report aims at spotting relationships among ranges of food-to-go items of a major U.K. supermarket and at assessing whether these can be grouped into sub-categories.

The analysis is carried out on a weekly sales report for the week starting the 23rd and finishing the 29th of April 2018. The research questions that will be investigated are:

  • Which product ranges experience the same movement in price?
  • Which product ranges experience the same movement in sales?
  • Is sales change sensitive to the change in price of a specific range?
  • Which price change affects the sales of the retailer’s best-selling range?

The report is structured as follows. The first section regards analysis and discussion. Here, it is explained how the data was pre-processed prior to analysis, how it is summarised with descriptive statistics and how it is analysed to answer the research questions. The last section of the report will address the limitation of the research and conclusion will be given.

In [142]:
# import packages
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
import numpy as np
import seaborn as sns
import pysal as ps
from sklearn import cluster, metrics
from sklearn import preprocessing
from sklearn import model_selection
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as sm
from collections import Counter
import collections
import itertools
from string import ascii_letters


%matplotlib inline
In [143]:
#read data
data = pd.read_csv('./tescoenco.csv')
data.head(3)
Out[143]:
Analysis Level Product Group TPN Item Name Average Price Per Unit (% Change) Average Price Per Unit (Latest) Customers (% Change) Customers (Latest) Frequency of Purchase (% Change) Frequency of Purchase (Latest) Sales Units (% Change) Sales Units (Latest) Sales Value (% Change) Sales Value (Latest) Spend Per Customer (% Change) Spend Per Customer (Latest) Stores Selling (% Change) Stores Selling (Latest) Units Per Visit (% Change) Units Per Visit (Latest)
0 Product Groups BRANDED SANDWICHES - F54TI NaN NaN 25.7% œ2.64 1,183.3% 203,652 -21.1% 1.95 1,104.4% 1,221,931 1,414.4% œ3,224,646 1.8% œ5.75 2,491.3% 1,788 -2.6% 1.13
1 Products BRANDED SANDWICHES - F54TI 55723253.0 DD'S KOSHER DEEP FLLD CHEESE& COLESLAW S/W -100.0% œ0.00 -100.0% 0 -100.0% 0.00 -100.0% 0 -100.0% œ0 -100.0% œ0.00 -100.0% 0 -100.0% 0.00
2 Products BRANDED SANDWICHES - F54TI 55723420.0 DD'S KOSHER DEEP FLLD SALT BEEF SANDWICH -0.3% œ3.36 77.7% 1,390 0.4% 1.94 79.2% 7,956 78.6% œ26,719 1.0% œ6.84 -5.6% 17 1.2% 1.05

Data Pre-Processing

In [144]:
# keep columns of interest
data_sub= data.iloc[:,[0,1,3,4,10,11,13]]
In [145]:
#remane variables
data_sub.columns = ['analysis_level','product_group', 'item_name', 'price_change',\
               'sales_vol_change', 'sales_volume', 'sales_value']
In [146]:
#check for missing data
def count_missing(data):
    missing_count = pd.DataFrame(data.isnull().sum(), \
                                 columns=['Count']).sort_values(by=['Count'],\
                                                                ascending=False)
    return missing_count
#apply the function to the data 
count_missing(data_sub)
Out[146]:
Count
price_change 199
sales_vol_change 199
item_name 19
product_group 1
analysis_level 0
sales_volume 0
sales_value 0
In [147]:
# replace missing sales volume change and price change with 0%
cols = ['price_change','sales_vol_change']

for i in cols: 
    data_sub[i] = data_sub[i].fillna('0%')
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/ipykernel_launcher.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """
In [148]:
# Replace cells containing 'Product Groups' with 'NaN' 
data_sub['analysis_level']=data_sub['analysis_level'].replace({'Product Groups':np.nan})

# all rows contataining '-100%' indicate that the product was not sold during the week
# therefore, I am excluding them from the analysis by:
    # turning all '-100&' into 'NaN'
    # deleting all rows containing 'NaN'
data_sub['price_change']=data_sub['price_change'].replace({'-100.0%':np.nan})
data_sub['sales_vol_change']=data_sub['sales_vol_change'].replace({'-100.0%':np.nan})
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/ipykernel_launcher.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
In [149]:
# Drop all rows containing 'NaN' values
data_sub = data_sub.dropna()

# Reset the index of the dataset
data_sub = data_sub.reset_index(drop=True)
In [150]:
# delete % sign from 'price_change' column
for lab, row in data_sub.iterrows():
    row['price_change'] = row['price_change'].rstrip('%')
    #make column of type float to make it easier to analyse later
    row['price_change'] = float(row['price_change'])
In [151]:
for lab, row in data_sub.iterrows():
    # delete ',' and '%' signs from 'sales_vol_change' column
    row['sales_vol_change'] = row['sales_vol_change'].replace(",", "")
    row['sales_vol_change'] = row['sales_vol_change'].rstrip('%')
    #make column of type float 
    row['sales_vol_change'] = float(row['sales_vol_change'])
In [152]:
for lab, row in data_sub.iterrows():
    # delete ',' sign from 'sales_volume' column
    row['sales_volume'] = row['sales_volume'].replace(",", "")
    # make column of type float
    row['sales_volume'] = float(row['sales_volume'])
In [153]:
for lab, row in data_sub.iterrows():
    # delete ',' and 'œ' signs from 'sales_value' column
    row['sales_value'] = row['sales_value'].replace(",", "")
    row['sales_value'] = row['sales_value'][1:]
    #make column of type float 
    row['sales_vol_change'] = float(row['sales_vol_change'])
In [154]:
# delete first column of dataset
data_sub = data_sub.iloc[:,1:] 
In [155]:
# make 'price_change','sales_vol_change' and 'sales_volume' columns of type 'numeric'
cols = ['price_change','sales_vol_change', 'sales_volume', 'sales_value']
data_sub[cols] =data_sub[cols].apply(pd.to_numeric, errors='coerce')
In [156]:
# delete code from product group 
data_sub['product_group'] =data_sub['product_group'].astype(str).str[:-6]
In [157]:
# show pre-processed dataset, ready for analysis
data_sub.head()
Out[157]:
product_group item_name price_change sales_vol_change sales_volume sales_value
0 BRANDED SANDWICHES - DD'S KOSHER DEEP FLLD SALT BEEF SANDWICH -0.3 79.2 7956.0 26719
1 BRANDED SANDWICHES - GOURMET FOODS CHEDDAR&PICKLE SANDWICH 7.7 23.0 1792.0 2789
2 BRANDED SANDWICHES - GOURMET FOODS HAM SALAD&MAYO SANDWICH -7.2 -1.7 925.0 1494
3 BRANDED SANDWICHES - GOURMET FOODS CHICKEN,STUFFING&SAGE SANDWICH 1.3 -0.8 1417.0 2501
4 BRANDED SANDWICHES - GOURMET FOODS TUNA&SWEETCORN SANDWICH 3.1 37.5 1941.0 3177

Analysis

In [158]:
# create a copy of the processed dataset
analysis_ready = data_sub

Descriptive Statistics

How many are there in the dataset?

In [159]:
items = analysis_ready['item_name'].unique().shape
print('There are '+str(items[0]) + ' items in the dataset')
There are 836 items in the dataset

How many categories of food-to-go are there in the data?

In [160]:
cate = analysis_ready['product_group'].unique().shape
print('There are '+str(cate[0]) + ' categories of items in the dataset')
There are 18 categories of items in the dataset

How many items per category were stocked stocked?

In [161]:
# create dataframe with the value counts 
lis = pd.DataFrame(analysis_ready['product_group'].value_counts())
lis=lis.reset_index(drop = False)
# make a bar graph 
pic = analysis_ready['product_group'].value_counts()[:10].plot(kind='barh',
                                                         title= str(lis.iloc[0,0]).lower()+' is the top stocked range of the week')
pic.set_xlabel("Number of Items")
Out[161]:
Text(0.5,0,'Number of Items')

Which Range Sells the Most Units?

In [162]:
top_units_category.iloc[0,0]
Out[162]:
'STANDARD SANDWICHES -'
In [163]:
top_units_category= data_sub[['sales_volume','product_group']]\
                .groupby(['product_group']).sum().reset_index().sort_values\
                (by='sales_volume', ascending= False)[:10]
        

plt.barh(top_units_category['product_group'], top_units_category['sales_volume'])
plt.xlabel('Units Sold')
plt.title(str(top_units_category.iloc[0,0]).lower()+' is the Best Selling Range of the Week')
plt.show()

Which range brought the most revenue?

In [164]:
top_category= data_sub[['sales_value','product_group']]\
                .groupby(['product_group']).sum().reset_index().sort_values\
                (by='sales_value', ascending= False)[:10]

plt.barh(top_category['product_group'], top_category['sales_value'])
plt.xlabel('Revenue')
plt.title( str(top_category.iloc[0,0]).lower()+' is the Best Selling Range of the Week')
plt.show()

What percentage of Verenue does "standard sandwich" account for?

In [165]:
# create dataframe grouped by product group by summing sales_value
top_category= data_sub[['sales_value','product_group']]\
                .groupby(['product_group']).sum().reset_index()

# standard sandwiches sales
SS = top_category.loc[top_category['product_group']=='STANDARD SANDWICHES -', 'sales_value']
# total sales
Total = top_category['sales_value'].sum()

# standard sandwiches over total sales
SS_percentage_sales = pd.DataFrame((SS / Total)*100)

#show percentage
print('Standard Sandwitches account for '+str(SS_percentage_sales.iloc[0,0])[0:6]+'% of total sales')
Standard Sandwitches account for 43.557% of total sales

Which Item sold the Most Units?

In [166]:
top_item_units.iloc[0,0]
Out[166]:
'TESCO CHEESE & ONION BRWN BRD S/WICH'
In [167]:
top_item_units= analysis_ready[['sales_volume','item_name']]\
                .groupby(['item_name']).sum().reset_index().sort_values\
                (by='sales_volume', ascending= False)[0:10]

pic1= top_item_units.set_index('item_name').plot(kind='barh',
                                    title = str(top_item_units.iloc[0,0]).lower()+' is the Best Selling Item of the Week')
pic1.set_xlabel("Units Sold")
pic1.set_ylabel(" ")
Out[167]:
Text(0,0.5,' ')

Which Item brought the most revenue?

In [168]:
c= analysis_ready[['sales_value','item_name']]\
                .groupby(['item_name']).sum().reset_index().sort_values\
                (by='sales_value', ascending= False)[0:7]

pic1= c.set_index('item_name').plot(kind='barh',
                                    title = str(c.iloc[0,0]).lower()+' is the best selling item of the week')
pic1.set_xlabel("Revenue")
pic1.set_ylabel(" ")
Out[168]:
Text(0,0.5,' ')

Testing my Hipothesis Questions

Question 1: "Which Product Ranges Have Co-movement in Prices?"

To find this out, I am computing correlation matrix for 'change in price' between product ranges.

In [169]:
# Subset dataset to contain 'product_group' and 'price_change'. Name the subset 'price'
price = analysis_ready.iloc[:,[0,2]]

# Create a dataframe with range as columns and the price change as rows.

#create empty list 'lis'.
lis = []
#for each unique element of product_group (i.e. A, B, ... T)
for i in price['product_group'].unique():
    # create a series containin price_change of A, B, ... T
    x = price.loc[price['product_group'] == str(i), 'price_change']
    # reset the index of the series
    x = x.reset_index(drop = True)
    # append series to empty list 'lis'
    lis.append(x)

#concatenate the series in 'lis' together, to form a new dataframe: 'total_price'
total_price = pd.concat(lis, axis=1)

# rename columns of 'total_price'
cols = list(price['product_group'].unique())
total_price.columns = cols

# calculate correlation between price change
cor_price = total_price.corr(method = "pearson")

#show correlation matrix
cor_price
Out[169]:
BRANDED SANDWICHES - FINEST SANDWICHES - FOOD TO GO - FOS FRESH CUT FRUIT - FOS SNACKING - HEALTHY LIVING SANDWICHS - HOT FOOD TO GO INDULGENCE - ORANGE JUICE - PICKBOX - SALAD BAR SALADS - SINGLE SUSHI - STANDARD PASTA - STANDARD ROLLS AND SUBS - STANDARD SANDWICHES - STANDARD WRAPS - SUSHI -
BRANDED SANDWICHES - 1.000000 -0.014007 0.323035 0.039133 -0.225083 -0.311122 -0.120911 0.490899 0.215522 -0.180426 -0.090388 -0.002699 NaN -0.025603 -0.245747 0.009663 0.152989 0.109284
FINEST SANDWICHES - -0.014007 1.000000 -0.121235 0.095500 -0.122116 -0.224002 0.077003 -0.175530 0.484421 -0.017827 0.062621 -0.295206 NaN 0.157086 0.016655 0.021593 -0.003816 -0.089854
FOOD TO GO - 0.323035 -0.121235 1.000000 -0.019421 -0.192396 -0.082656 0.090752 0.096524 0.450557 -0.014835 -0.071772 0.145523 NaN -0.162793 -0.153450 0.104973 -0.698868 -0.060114
FOS FRESH CUT FRUIT - 0.039133 0.095500 -0.019421 1.000000 0.114055 -0.016955 0.032308 0.278032 0.222257 -0.267727 -0.122000 0.160790 NaN -0.066798 -0.041455 0.042516 0.041497 0.097661
FOS SNACKING - -0.225083 -0.122116 -0.192396 0.114055 1.000000 0.295209 0.375387 0.007147 -0.310297 -0.168084 0.205389 0.085398 NaN 0.024644 0.122329 -0.139246 0.458772 -0.045962
HEALTHY LIVING SANDWICHS - -0.311122 -0.224002 -0.082656 -0.016955 0.295209 1.000000 0.116442 -0.003523 -0.482675 0.410881 0.080421 0.188352 NaN 0.346722 -0.203581 -0.222226 0.099933 -0.076468
HOT FOOD TO GO -0.120911 0.077003 0.090752 0.032308 0.375387 0.116442 1.000000 0.036099 0.419463 0.030791 0.088202 -0.115493 NaN 0.097724 0.531286 0.023981 0.001030 -0.154140
INDULGENCE - 0.490899 -0.175530 0.096524 0.278032 0.007147 -0.003523 0.036099 1.000000 -0.140893 0.113969 -0.199467 -0.247395 NaN -0.169698 0.110621 0.392936 0.294751 0.234241
ORANGE JUICE - 0.215522 0.484421 0.450557 0.222257 -0.310297 -0.482675 0.419463 -0.140893 1.000000 -0.531506 0.056988 -0.144346 NaN 0.040057 0.397197 0.210402 -0.469706 -0.064855
PICKBOX - -0.180426 -0.017827 -0.014835 -0.267727 -0.168084 0.410881 0.030791 0.113969 -0.531506 1.000000 -0.194278 -0.097220 NaN 0.061493 -0.043785 0.180950 -0.139402 -0.045869
SALAD BAR -0.090388 0.062621 -0.071772 -0.122000 0.205389 0.080421 0.088202 -0.199467 0.056988 -0.194278 1.000000 0.107981 NaN 0.068512 0.076184 0.029883 -0.001644 0.033356
SALADS - -0.002699 -0.295206 0.145523 0.160790 0.085398 0.188352 -0.115493 -0.247395 -0.144346 -0.097220 0.107981 1.000000 NaN 0.007337 -0.155314 0.086953 0.040984 -0.103382
SINGLE SUSHI - NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
STANDARD PASTA - -0.025603 0.157086 -0.162793 -0.066798 0.024644 0.346722 0.097724 -0.169698 0.040057 0.061493 0.068512 0.007337 NaN 1.000000 -0.257756 -0.004872 0.022922 -0.153918
STANDARD ROLLS AND SUBS - -0.245747 0.016655 -0.153450 -0.041455 0.122329 -0.203581 0.531286 0.110621 0.397197 -0.043785 0.076184 -0.155314 NaN -0.257756 1.000000 0.049447 0.040904 -0.118859
STANDARD SANDWICHES - 0.009663 0.021593 0.104973 0.042516 -0.139246 -0.222226 0.023981 0.392936 0.210402 0.180950 0.029883 0.086953 NaN -0.004872 0.049447 1.000000 0.009022 0.049405
STANDARD WRAPS - 0.152989 -0.003816 -0.698868 0.041497 0.458772 0.099933 0.001030 0.294751 -0.469706 -0.139402 -0.001644 0.040984 NaN 0.022922 0.040904 0.009022 1.000000 -0.184640
SUSHI - 0.109284 -0.089854 -0.060114 0.097661 -0.045962 -0.076468 -0.154140 0.234241 -0.064855 -0.045869 0.033356 -0.103382 NaN -0.153918 -0.118859 0.049405 -0.184640 1.000000
In [170]:
# Get top 5 positive correlation and top 5 negative correlations by: 

# create a function that gets redundant pairs of correlation
def get_redundant_pairs(df):
    #create empy set for reduntant pairs
    pairs_to_drop = set()
    #set columns to be the same of the dataframe on which the function will be run
    cols = df.columns
    # for each number of a range (from 0 to n of rows in dataframe)
    for i in range(0, df.shape[1]):
        #for each number in range (from 0 to (n of rows in dataframe + 1))
        for j in range(0, i+1):
            # add to pairs_to_drop set pairs of columns
            pairs_to_drop.add((cols[i], cols[j]))
    # return pairs_to_drop set
    return pairs_to_drop

# Create a function that gives the n top positive correlations in a dataframe 'df'

# declare name of function
def get_top_pos_correlations(df, n=5):
    # calculate correlation matrix of 'df' and unstack it (put it in one column)
    au_corr = df.corr(method = 'pearson').unstack()
    # use function 'get_redundant_pairs' to select redundant pairs of dataframe
    labels_to_drop = get_redundant_pairs(df)
    # drop redundant pairs and sort remaining pairs in a descending order
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    # return 'n' sorted values 
    return au_corr[0:n]

# Create a function that gives the n top positive correlations in a dataframe 'df'
# this function is the same as 'get_top_pos_correlations' however values are sorted in
# a ascending order.
def get_top_neg_correlations(df, n=5):
    au_corr = df.corr(method ='pearson').unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=True)
    return au_corr[0:n]

# Print results
print("Top positive correlations (sales volume change)")
print(get_top_pos_correlations(total_price, 5))

print('  ')

print("Top negative correlations (sales volume change)")
print(get_top_neg_correlations(total_price, 5))
Top positive correlations (sales volume change)
HOT FOOD TO GO        STANDARD ROLLS AND SUBS -    0.531286
BRANDED SANDWICHES -  INDULGENCE -                 0.490899
FINEST SANDWICHES -   ORANGE JUICE -               0.484421
FOS SNACKING -        STANDARD WRAPS -             0.458772
FOOD TO GO -          ORANGE JUICE -               0.450557
dtype: float64
  
Top negative correlations (sales volume change)
FOOD TO GO -                STANDARD WRAPS -             -0.698868
ORANGE JUICE -              PICKBOX -                    -0.531506
HEALTHY LIVING SANDWICHS -  ORANGE JUICE -               -0.482675
ORANGE JUICE -              STANDARD WRAPS -             -0.469706
BRANDED SANDWICHES -        HEALTHY LIVING SANDWICHS -   -0.311122
dtype: float64
In [171]:
#visualization of correlation
sns.set(style="white")

# create mask for the upper triangle
mask = np.zeros_like(cor_price, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
plt.title("Correlation Map of Price Changes")

# create a colormap
cmap = sns.diverging_palette(220, 4, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(cor_price, mask=mask, cmap=cmap, vmax=.7, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .6})
Out[171]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1d163358>
In [172]:
# generate scatterplots for each pair of price_change
sns.pairplot(total_price, kind='scatter', diag_kind="kde", dropna = False);
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:454: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X>clip[0], X<clip[1])] # won't work for two columns.
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:454: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X>clip[0], X<clip[1])] # won't work for two columns.
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/_methods.py:132: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/bandwidths.py:22: RuntimeWarning: invalid value encountered in minimum
  return np.minimum(np.std(X, axis=0, ddof=1), IQR)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Question 2: "Which Product Ranges Have Co-movement in Sales?"

To find this out, I am computing correlation matrix for 'change in sales volume' between product ranges.

In [173]:
analysis_ready.head(2)
Out[173]:
product_group item_name price_change sales_vol_change sales_volume sales_value
0 BRANDED SANDWICHES - DD'S KOSHER DEEP FLLD SALT BEEF SANDWICH -0.3 79.2 7956.0 26719
1 BRANDED SANDWICHES - GOURMET FOODS CHEDDAR&PICKLE SANDWICH 7.7 23.0 1792.0 2789
In [174]:
# subset dataset to contain 'product_group' and 'sales_vol_change'
volume = analysis_ready.iloc[:,[0,3]]

#create a dataframe with range as columns and the sales volume change as rows

# create empty list 'lis'
lis = []
# for each prodcut range
for i in volume['product_group'].unique():
    # return series containin sales volume change
    x = volume.loc[volume['product_group'] == str(i), 'sales_vol_change']
    # reset index of the series
    x = x.reset_index(drop = True)
    # append series in empty list 
    lis.append(x)

#concatenate the series together to create a dataframe 'total_vol'   
total_vol = pd.concat(lis, axis=1)

# rename columns of 'total_vol'
cols = list(volume['product_group'].unique())
total_vol.columns = cols

#correlation matrix 
cor_vol = total_vol.corr(method= 'pearson')

#show correlation matrix
cor_vol
Out[174]:
BRANDED SANDWICHES - FINEST SANDWICHES - FOOD TO GO - FOS FRESH CUT FRUIT - FOS SNACKING - HEALTHY LIVING SANDWICHS - HOT FOOD TO GO INDULGENCE - ORANGE JUICE - PICKBOX - SALAD BAR SALADS - SINGLE SUSHI - STANDARD PASTA - STANDARD ROLLS AND SUBS - STANDARD SANDWICHES - STANDARD WRAPS - SUSHI -
BRANDED SANDWICHES - 1.000000 0.180150 -0.083285 -0.099395 0.212882 0.330315 -0.060328 -0.100273 0.015398 -0.177786 0.097033 -0.091339 NaN 0.153660 -0.181045 0.161892 -0.161438 -0.202151
FINEST SANDWICHES - 0.180150 1.000000 -0.443400 -0.106632 0.127169 0.093467 -0.064790 0.205851 -0.641083 0.120001 -0.112141 -0.229412 NaN 0.493853 0.175313 -0.302165 -0.414226 0.223391
FOOD TO GO - -0.083285 -0.443400 1.000000 0.099529 0.016628 0.037621 -0.134761 -0.073393 0.087234 -0.102342 0.116446 0.103715 NaN -0.199997 -0.085658 -0.110684 0.143149 -0.164509
FOS FRESH CUT FRUIT - -0.099395 -0.106632 0.099529 1.000000 0.032300 -0.309464 0.058826 0.099711 -0.548939 0.289095 -0.017597 0.068348 NaN 0.077025 -0.053623 0.258115 0.015155 -0.070378
FOS SNACKING - 0.212882 0.127169 0.016628 0.032300 1.000000 -0.389619 -0.559196 0.330337 -0.277482 -0.387269 0.254431 0.007602 NaN -0.421288 0.428328 0.081676 0.108673 0.053560
HEALTHY LIVING SANDWICHS - 0.330315 0.093467 0.037621 -0.309464 -0.389619 1.000000 0.178967 -0.236641 -0.364018 -0.145866 -0.022236 -0.047907 NaN 0.258532 -0.336294 -0.703259 -0.154376 0.020877
HOT FOOD TO GO -0.060328 -0.064790 -0.134761 0.058826 -0.559196 0.178967 1.000000 0.316181 -0.608032 0.294092 0.012209 0.204316 NaN 0.206871 -0.452816 -0.111467 -0.129741 -0.021752
INDULGENCE - -0.100273 0.205851 -0.073393 0.099711 0.330337 -0.236641 0.316181 1.000000 -0.374928 0.075260 0.088756 -0.250061 NaN 0.257822 0.263941 -0.034803 0.012693 0.228406
ORANGE JUICE - 0.015398 -0.641083 0.087234 -0.548939 -0.277482 -0.364018 -0.608032 -0.374928 1.000000 0.252979 0.778424 0.493301 NaN -0.314780 -0.133502 0.228982 0.314495 -0.324246
PICKBOX - -0.177786 0.120001 -0.102342 0.289095 -0.387269 -0.145866 0.294092 0.075260 0.252979 1.000000 -0.329960 -0.170539 NaN 0.118869 -0.084924 -0.010885 -0.026271 0.429762
SALAD BAR 0.097033 -0.112141 0.116446 -0.017597 0.254431 -0.022236 0.012209 0.088756 0.778424 -0.329960 1.000000 -0.026302 NaN -0.108926 -0.205929 -0.022538 -0.026345 -0.144009
SALADS - -0.091339 -0.229412 0.103715 0.068348 0.007602 -0.047907 0.204316 -0.250061 0.493301 -0.170539 -0.026302 1.000000 NaN -0.185079 -0.424116 -0.017232 0.331589 0.192723
SINGLE SUSHI - NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
STANDARD PASTA - 0.153660 0.493853 -0.199997 0.077025 -0.421288 0.258532 0.206871 0.257822 -0.314780 0.118869 -0.108926 -0.185079 NaN 1.000000 -0.132394 0.061751 -0.435584 -0.009796
STANDARD ROLLS AND SUBS - -0.181045 0.175313 -0.085658 -0.053623 0.428328 -0.336294 -0.452816 0.263941 -0.133502 -0.084924 -0.205929 -0.424116 NaN -0.132394 1.000000 0.158550 0.363924 0.046441
STANDARD SANDWICHES - 0.161892 -0.302165 -0.110684 0.258115 0.081676 -0.703259 -0.111467 -0.034803 0.228982 -0.010885 -0.022538 -0.017232 NaN 0.061751 0.158550 1.000000 -0.020102 -0.233710
STANDARD WRAPS - -0.161438 -0.414226 0.143149 0.015155 0.108673 -0.154376 -0.129741 0.012693 0.314495 -0.026271 -0.026345 0.331589 NaN -0.435584 0.363924 -0.020102 1.000000 -0.029201
SUSHI - -0.202151 0.223391 -0.164509 -0.070378 0.053560 0.020877 -0.021752 0.228406 -0.324246 0.429762 -0.144009 0.192723 NaN -0.009796 0.046441 -0.233710 -0.029201 1.000000
In [175]:
# Get top 5 positive correlation and top 5 negative correlations by: 

def get_redundant_pairs(df):
    pairs_to_drop = set()
    cols = df.columns
    for i in range(0, df.shape[1]):
        for j in range(0, i+1):
            pairs_to_drop.add((cols[i], cols[j]))
    return pairs_to_drop

def get_top_pos_correlations(df, n=5):
    au_corr = df.corr(method = 'pearson').unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    return au_corr[0:n]

def get_top_neg_correlations(df, n=5):
    au_corr = df.corr().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=True)
    return au_corr[0:n]


print("Top positive correlations (sales volume change)")
print(get_top_pos_correlations(total_vol, 5))

print ('  ')
print("Top negative correlations (sales volume change)")
print(get_top_neg_correlations(total_vol, 5))
Top positive correlations (sales volume change)
ORANGE JUICE -       SALAD BAR                    0.778424
FINEST SANDWICHES -  STANDARD PASTA -             0.493853
ORANGE JUICE -       SALADS -                     0.493301
PICKBOX -            SUSHI -                      0.429762
FOS SNACKING -       STANDARD ROLLS AND SUBS -    0.428328
dtype: float64
  
Top negative correlations (sales volume change)
HEALTHY LIVING SANDWICHS -  STANDARD SANDWICHES -   -0.703259
FINEST SANDWICHES -         ORANGE JUICE -          -0.641083
HOT FOOD TO GO              ORANGE JUICE -          -0.608032
FOS SNACKING -              HOT FOOD TO GO          -0.559196
FOS FRESH CUT FRUIT -       ORANGE JUICE -          -0.548939
dtype: float64
In [176]:
#visualization of correlation
sns.set(style="white")

# create mask for the upper triangle
mask = np.zeros_like(cor_vol, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up picture
f, ax = plt.subplots(figsize=(11, 9))
plt.title("Correlation Map of Sales Changes")

# create a colormap
cmap = sns.diverging_palette(220, 4, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(cor_vol, mask=mask, cmap=cmap, vmax=.8, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})
Out[176]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a19d8b4e0>
In [177]:
#scatter plots of sales volume change
sns.pairplot(total_price, kind='scatter', diag_kind="kde", dropna = False);
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:454: RuntimeWarning: invalid value encountered in greater
  X = X[np.logical_and(X>clip[0], X<clip[1])] # won't work for two columns.
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:454: RuntimeWarning: invalid value encountered in less
  X = X[np.logical_and(X>clip[0], X<clip[1])] # won't work for two columns.
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/_methods.py:132: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/statsmodels/nonparametric/bandwidths.py:22: RuntimeWarning: invalid value encountered in minimum
  return np.minimum(np.std(X, axis=0, ddof=1), IQR)
/Users/gioia/anaconda3/envs/gds/lib/python3.6/site-packages/numpy/core/fromnumeric.py:83: RuntimeWarning: invalid value encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

Question 3: "Is Sales Change Sensitive to the Price Change of Items of a Particular Range of Products?"

To find this out I have modeled a multiple regression model whose dependent variable is "sales_vol_change" and the independent variables are "price_change", "product_group" and the interaction between the two.

In [178]:
# model the regression model with the sm.ols formula
m1 = sm.ols(formula = 'sales_vol_change ~  price_change*product_group',\
            data = analysis_ready).fit()
In [179]:
# check results of regression
m1.summary()
Out[179]:
OLS Regression Results
Dep. Variable: sales_vol_change R-squared: 0.040
Model: OLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 1.013
Date: Thu, 01 Nov 2018 Prob (F-statistic): 0.448
Time: 10:52:37 Log-Likelihood: -10178.
No. Observations: 860 AIC: 2.043e+04
Df Residuals: 825 BIC: 2.059e+04
Df Model: 34
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 10.0765 3742.061 0.003 0.998 -7335.003 7355.156
product_group[T.FINEST SANDWICHES -] -10.2211 7481.402 -0.001 0.999 -1.47e+04 1.47e+04
product_group[T.FOOD TO GO -] -29.6014 9848.117 -0.003 0.998 -1.94e+04 1.93e+04
product_group[T.FOS FRESH CUT FRUIT -] -27.4037 6289.410 -0.004 0.997 -1.24e+04 1.23e+04
product_group[T.FOS SNACKING -] 55.6212 7390.078 0.008 0.994 -1.44e+04 1.46e+04
product_group[T.HEALTHY LIVING SANDWICHS -] -0.3275 9987.607 -3.28e-05 1.000 -1.96e+04 1.96e+04
product_group[T.HOT FOOD TO GO] 257.9780 4408.342 0.059 0.953 -8394.907 8910.863
product_group[T.INDULGENCE -] -41.3694 1.18e+04 -0.004 0.997 -2.31e+04 2.3e+04
product_group[T.ORANGE JUICE -] 936.3399 1.09e+04 0.086 0.932 -2.05e+04 2.24e+04
product_group[T.PICKBOX -] 1097.9574 7664.830 0.143 0.886 -1.39e+04 1.61e+04
product_group[T.SALAD BAR] 1.863e+04 5705.026 3.265 0.001 7430.369 2.98e+04
product_group[T.SALADS -] 456.0308 6036.703 0.076 0.940 -1.14e+04 1.23e+04
product_group[T.SINGLE SUSHI -] -60.0765 3.43e+04 -0.002 0.999 -6.74e+04 6.72e+04
product_group[T.STANDARD PASTA -] -7.9646 7161.646 -0.001 0.999 -1.41e+04 1.4e+04
product_group[T.STANDARD ROLLS AND SUBS -] 4.3935 7941.211 0.001 1.000 -1.56e+04 1.56e+04
product_group[T.STANDARD SANDWICHES -] -7.5799 4979.192 -0.002 0.999 -9780.955 9765.796
product_group[T.STANDARD WRAPS -] -28.5994 7084.857 -0.004 0.997 -1.39e+04 1.39e+04
product_group[T.SUSHI -] -48.9386 6131.309 -0.008 0.994 -1.21e+04 1.2e+04
price_change 1.5989 898.897 0.002 0.999 -1762.796 1765.994
price_change:product_group[T.FINEST SANDWICHES -] -3.0952 946.457 -0.003 0.997 -1860.843 1854.653
price_change:product_group[T.FOOD TO GO -] -2.5423 1100.714 -0.002 0.998 -2163.072 2157.988
price_change:product_group[T.FOS FRESH CUT FRUIT -] -2.0879 914.631 -0.002 0.998 -1797.365 1793.189
price_change:product_group[T.FOS SNACKING -] -18.1095 1901.878 -0.010 0.992 -3751.198 3714.979
price_change:product_group[T.HEALTHY LIVING SANDWICHS -] -5.0304 2666.390 -0.002 0.998 -5238.737 5228.676
price_change:product_group[T.HOT FOOD TO GO] -2.0685 899.391 -0.002 0.998 -1767.433 1763.296
price_change:product_group[T.INDULGENCE -] -4.0520 2595.090 -0.002 0.999 -5097.808 5089.704
price_change:product_group[T.ORANGE JUICE -] 143.7736 1373.236 0.105 0.917 -2551.674 2839.221
price_change:product_group[T.PICKBOX -] -127.1284 2932.565 -0.043 0.965 -5883.295 5629.038
price_change:product_group[T.SALAD BAR] -326.5837 901.556 -0.362 0.717 -2096.198 1443.030
price_change:product_group[T.SALADS -] -15.4386 1254.086 -0.012 0.990 -2477.013 2446.136
price_change:product_group[T.SINGLE SUSHI -] -2.348e-13 1.6e-12 -0.147 0.883 -3.38e-12 2.91e-12
price_change:product_group[T.STANDARD PASTA -] -2.4678 2578.806 -0.001 0.999 -5064.260 5059.324
price_change:product_group[T.STANDARD ROLLS AND SUBS -] -6.9514 1396.962 -0.005 0.996 -2748.968 2735.066
price_change:product_group[T.STANDARD SANDWICHES -] -1.3403 976.185 -0.001 0.999 -1917.440 1914.759
price_change:product_group[T.STANDARD WRAPS -] -2.3515 1051.701 -0.002 0.998 -2066.675 2061.972
price_change:product_group[T.SUSHI -] -7.2998 2757.042 -0.003 0.998 -5418.941 5404.342
Omnibus: 2243.887 Durbin-Watson: 2.002
Prob(Omnibus): 0.000 Jarque-Bera (JB): 22286644.863
Skew: 27.528 Prob(JB): 0.00
Kurtosis: 789.716 Cond. No. 1.24e+16

Result Interpretation

Question 4: "Does the price change of items of other ranges affect the sales of items of standard sandwiches?"

To do so, I have:

  • Calculated the correlation between quantity of sales of A sandwitches and price changes of items of all other ranges
  • Modeled a regression model that calculates whether the price change of highly correlated ranges affects the sales of items in range "R".
In [180]:
analysis_ready.head()
Out[180]:
product_group item_name price_change sales_vol_change sales_volume sales_value
0 BRANDED SANDWICHES - DD'S KOSHER DEEP FLLD SALT BEEF SANDWICH -0.3 79.2 7956.0 26719
1 BRANDED SANDWICHES - GOURMET FOODS CHEDDAR&PICKLE SANDWICH 7.7 23.0 1792.0 2789
2 BRANDED SANDWICHES - GOURMET FOODS HAM SALAD&MAYO SANDWICH -7.2 -1.7 925.0 1494
3 BRANDED SANDWICHES - GOURMET FOODS CHICKEN,STUFFING&SAGE SANDWICH 1.3 -0.8 1417.0 2501
4 BRANDED SANDWICHES - GOURMET FOODS TUNA&SWEETCORN SANDWICH 3.1 37.5 1941.0 3177
In [181]:
# Subset data to only include sales volume change data for items in range R
sales_volume_change = analysis_ready.loc[analysis_ready['product_group'] == 'STANDARD SANDWICHES -',\
                                         'sales_vol_change']
sales_volume_change = sales_volume_change.reset_index(drop=True)

# create a list containing all product ranges minus product range "R"
prices = list(analysis_ready['product_group'].unique())
# remove range "R" from list 
prices.remove('STANDARD SANDWICHES -')
In [186]:
# create dataframe on which correlation will be calculated

# create empty list w
w = []
# for each range from B to T
for i in prices:
    # create series containing price change for ranges A to T - excluding R
    x = analysis_ready.loc[volume['product_group'] == str(i), 'price_change']
    # reset index of series 
    x = x.reset_index(drop = True)
    #append series to empty list w
    w.append(x)

#concatenate the series together     
total_w = pd.concat(w, axis=1)

# edit list to rename columns
string = '_price_change'
prices_edited = [x[:-4] + string for x in prices]

# remane columns of dataframe 
total_w.columns = prices_edited

# show dataframe 
total_w.head(10)
Out[186]:
BRANDED SANDWICH_price_change FINEST SANDWICH_price_change FOOD TO _price_change FOS FRESH CUT FRU_price_change FOS SNACKI_price_change HEALTHY LIVING SANDWIC_price_change HOT FOOD T_price_change INDULGEN_price_change ORANGE JUI_price_change PICKB_price_change SALAD_price_change SALA_price_change SINGLE SUS_price_change STANDARD PAS_price_change STANDARD ROLLS AND SU_price_change STANDARD WRA_price_change SUS_price_change
0 -0.3 -2.1 0.0 -6.3 0.0 0.0 -7.7 1.1 0.0 0.0 0.0 0.0 0.0 -1.1 2.6 -3.1 3.6
1 7.7 -3.7 39.7 3.7 0.0 3.3 -1.5 3.1 4.1 1.5 -17.9 0.9 NaN -1.1 2.7 -2.1 0.0
2 -7.2 -2.6 0.0 0.0 0.0 -0.8 -4.3 -8.0 -7.5 -3.2 -3.8 0.0 NaN 0.4 0.6 -2.2 0.0
3 1.3 -4.3 0.0 0.0 0.0 4.5 -3.4 0.0 -14.4 5.2 11.4 0.0 NaN 4.1 0.0 -1.7 -0.3
4 3.1 0.0 0.0 4.0 0.0 0.8 -1.3 1.4 2.2 -0.1 -41.2 0.0 NaN 3.2 0.7 -1.8 0.1
5 -14.6 -2.9 0.0 -4.0 1.6 6.1 3.9 -1.8 -8.5 0.0 -30.7 0.0 NaN 3.1 0.0 -2.7 0.0
6 -5.5 -5.4 0.0 0.4 11.1 4.5 1.5 -3.2 -5.2 0.1 104.8 0.0 NaN 0.0 0.0 -2.6 2.6
7 -10.7 0.0 0.0 -4.7 0.9 11.6 4.8 -6.0 -10.2 4.0 -11.7 0.0 NaN -0.1 0.0 -3.3 -0.6
8 -6.4 4.2 41.4 0.2 -1.3 0.0 0.0 -7.0 15.0 -0.4 9.2 1.5 NaN -0.3 -3.5 -45.0 0.8
9 -4.7 0.0 0.0 -4.8 -1.7 0.0 27.8 -3.4 15.4 -2.8 31.8 -0.4 NaN -1.7 27.1 -4.3 -0.7
In [187]:
# add sales volume change data of range A to the dataframe 
total_R = pd.concat([sales_volume_change, total_w], axis=1)

# create a correlation matrix and show how sales volume change of range A correlates to price change of ranges B to T
corr = pd.DataFrame(total_R.corr(method='pearson').iloc[:,0])

# show correlation coefficient
corr
Out[187]:
sales_vol_change
sales_vol_change 1.000000
BRANDED SANDWICH_price_change 0.096218
FINEST SANDWICH_price_change 0.214159
FOOD TO _price_change -0.116824
FOS FRESH CUT FRU_price_change 0.024813
FOS SNACKI_price_change -0.221470
HEALTHY LIVING SANDWIC_price_change 0.005505
HOT FOOD T_price_change -0.073000
INDULGEN_price_change 0.207097
ORANGE JUI_price_change 0.172523
PICKB_price_change -0.031493
SALAD_price_change 0.242796
SALA_price_change -0.070915
SINGLE SUS_price_change NaN
STANDARD PAS_price_change 0.200894
STANDARD ROLLS AND SU_price_change -0.089984
STANDARD WRA_price_change -0.129369
SUS_price_change 0.228208
In [188]:
#visualization of correlation
sns.set(style="white")

# create mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up picture
f, ax = plt.subplots(figsize=(11, 9))
plt.title("Correlation Between Standard Sandwiches Sales Volume Change and Price Change of all Other Ranges")

# create a colormap
cmap = sns.diverging_palette(220, 4, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.35, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})
Out[188]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a17f1df60>

Regression between sales volumen change of items of range A and price change of items in range H

The regression model shows that the price change of salads is statistically significant in predicting the sales change of standard sandwiches.

Intecept = -6.54 (when price change of salads is 0, sales change of standard sandwiches is -6.54)

price change of salads coefficient = 1.95(change in price of salads, affect sales change of sandwiches by 1.95)

In [190]:
# model linear regression model 
m2 = sm.ols(formula = 'sales_vol_change ~  SALAD_price_change',\
            data = total_R).fit()
m2.summary()
Out[190]:
OLS Regression Results
Dep. Variable: sales_vol_change R-squared: 0.059
Model: OLS Adj. R-squared: 0.044
Method: Least Squares F-statistic: 3.884
Date: Thu, 01 Nov 2018 Prob (F-statistic): 0.0532
Time: 10:54:24 Log-Likelihood: -308.91
No. Observations: 64 AIC: 621.8
Df Residuals: 62 BIC: 626.1
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.7699 3.876 -0.199 0.843 -8.519 6.979
SALAD_price_change 0.1227 0.062 1.971 0.053 -0.002 0.247
Omnibus: 10.254 Durbin-Watson: 1.801
Prob(Omnibus): 0.006 Jarque-Bera (JB): 24.949
Skew: 0.222 Prob(JB): 3.82e-06
Kurtosis: 6.026 Cond. No. 62.9

Conclusion

To conclude, this report attempted at identifying relationships among ranges of food to go products. The analysis was divided into three steps. The first was data cleaning and preprocessing, the second was descriptive statistics where standard sandwiches was nominated as the range generating the biggest portion of revenue (43.55%), the third, was analysis of correlations. The latter proved that certain ranges experience co-movements in prices and sales, and that the sales of standard sandwiches are related to the change in price of salads.