Recommendation System

Modeled on TripAdvisor data

Reccommendation Systems aim at finding items that users might be interested in given a set of characteristics. Recommendation systems are generally used on online stores and websites such as Netflix.(Mitra et al. 2016). The process of creating personalised recommendation for users is described in detail by Adomavicius & Tuzhilin (2005).

Leskovec et al. (2014) state that there are two main architectures for recommendation systems. First are content-based systems. These focus on the characteristics of the items. On a content base system, users are recommended items that are similar to the ones that they have already consumed. Second, are collaborative filtering systems which focus on the relationship between customers and items. Ansari et al. (1999) describes collaborative filtering as an algorithm that mimics word-of-mouth communication because the algorithm suggests customers, items that people similar to them have purchased.

This notebook shows how to create a simple recommendation system using trip advisor data. The aim is therefore to create restaurant recommendations. I first created a simple system that ranks all restaurants and returns the top rated. Second I created a system that gives recommendations based on a particular restaurant. For example, if you feed the alghorithm the name of a restaurant, it will return a list of similar ones to it.

In [21]:
# import packages
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn import model_selection
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from wordcloud import WordCloud
from math import log, sqrt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel



import warnings


import nltk
nltk.download('punkt')
warnings.filterwarnings('ignore')

%matplotlib inline
[nltk_data] Downloading package punkt to /Users/gioia/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
In [22]:
# read data 
TAdata = pd.read_csv('./TA_restaurants_curated.csv')
In [23]:
# copy data 
data = TAdata
In [24]:
# look at data
data.head()
Out[24]:
Unnamed: 0 Name City Cuisine Style Ranking Rating Price Range Number of Reviews Reviews URL_TA ID_TA
0 0 Martine of Martine's Table Amsterdam ['French', 'Dutch', 'European'] 1.0 5.0 $$ - $$$ 136.0 [['Just like home', 'A Warm Welcome to Wintry ... /Restaurant_Review-g188590-d11752080-Reviews-M... d11752080
1 1 De Silveren Spiegel Amsterdam ['Dutch', 'European', 'Vegetarian Friendly', '... 2.0 4.5 $$$$ 812.0 [['Great food and staff', 'just perfect'], ['0... /Restaurant_Review-g188590-d693419-Reviews-De_... d693419
2 2 La Rive Amsterdam ['Mediterranean', 'French', 'International', '... 3.0 4.5 $$$$ 567.0 [['Satisfaction', 'Delicious old school restau... /Restaurant_Review-g188590-d696959-Reviews-La_... d696959
3 3 Vinkeles Amsterdam ['French', 'European', 'International', 'Conte... 4.0 5.0 $$$$ 564.0 [['True five star dinner', 'A superb evening o... /Restaurant_Review-g188590-d1239229-Reviews-Vi... d1239229
4 4 Librije's Zusje Amsterdam Amsterdam ['Dutch', 'European', 'International', 'Vegeta... 5.0 4.5 $$$$ 316.0 [['Best meal.... EVER', 'super food experience... /Restaurant_Review-g188590-d6864170-Reviews-Li... d6864170
In [25]:
data.shape
Out[25]:
(125527, 11)

Pre-Processing

In [26]:
# eliminate columns we are not using
data = data.iloc[:,[1,2,3,4,5,6,7,8,10]]
In [27]:
# replace price range values with 'cheap', 'medium' and 'expensive' 
data['Price Range'].replace(['$', '$$ - $$$', '$$$$'], ['cheap', 'medium', 'expensive'], inplace=True)
In [28]:
# make city and name of restaurant lowercase 
data['City'] = data['City'].str.lower()
data['Name'] = data['Name'].str.lower()

1. Simple Reccommender

Simple recommenders are basic systems that recommends the top items based on a certain metric or score.

The following are the steps involved:

  • Decide on the metric or score to rate restaurants on.
  • Calculate the score for every restaurant.
  • Sort the restaurants based on the score and output the top results.

Step 1: Weight the ratings based on the numbers of reviews

$$ WR = \frac{v}{v+m}R + \frac{m}{v+m}C $$ where:

  • $v$ = number of reviews per restaurant
  • $m$ = mimimum number of reviews required to be listed in the chart
  • $R$ = average review of the restaurant
  • $C$ = mean review across all restaurants
In [29]:
# calculate C first
C = data['Rating'].mean()
print('The mean review across all restaurants is ', str(C)[0:5])
The mean review across all restaurants is  3.987
In [30]:
# caclulate m 
# What is the minimum number of review a restaurant need to have to be included in this chart
m = data['Number of Reviews'].quantile(0.50)
print('The minimum number of reviews required to be listed in the chart is',m)
The minimum number of reviews required to be listed in the chart is 32.0
In [31]:
# get restaurants that have at leat m reviews
SR_data = data.copy().loc[data['Number of Reviews'] >= m]
print(str(SR_data.shape[0]) + ' restaurants can be included in the chart')
54757 restaurants can be included in the chart
In [32]:
# create a function that calculate the weighted review for each restaurant
def weighted_review(x, m=m, C=C):
    # v is the number of reveiws of a particular restaurant
    v = x['Number of Reviews']
    # R is the average rating 
    R = x['Rating']
    # weighted rating
    WR = (v/(v+m) * R) + (m/(m+v) * C)
    # return weighted rating
    return WR
In [33]:
# create a new column of dataframe called 'score' where to store this value 
SR_data['score'] = SR_data.apply(weighted_review, axis=1)
In [34]:
# filter restuarants based on city and price range and then tell me the best 15 according to my score

# input city you want to select 
city = str(input('Insert City (lower case please): '))
# input price range 
price_range = str(input('Insert Price Range: "cheap", "medium", "expensive" or "all" '))

# if the price range is 'all'
if price_range == 'all':
    # only filter the city 
    city_data = SR_data.loc[SR_data['City'] == city,:]
else:
    # otherwise filter the city and price range
    city_data = SR_data.loc[(SR_data['City'] == city) & (SR_data['Price Range'] == price_range),:]

# sort restaurant by score 
city_data = city_data.sort_values('score', ascending=False)
# show top 10 rated resturant in that city and price range 
city_data[['Name', 'Cuisine Style', 'Rating', 'Price Range']].head(10)
Insert City (lower case please): rome
Insert Price Range: "cheap", "medium", "expensive" or "all" cheap
Out[34]:
Name Cuisine Style Rating Price Range
109131 pane e salame ['Italian', 'Street Food', 'Vegetarian Friendly'] 5.0 cheap
109150 pizza zizza caffetteria birreria desserteria ['Italian', 'Pizza', 'Fast Food', 'Vegetarian ... 5.0 cheap
109132 two sizes ['Fast Food', 'Italian', 'Cafe', 'Vegetarian F... 5.0 cheap
109133 bread-in ['Italian', 'Fast Food', 'Mediterranean', 'Eur... 5.0 cheap
109143 l'uliveto shop ['Italian', 'Fast Food', 'Mediterranean', 'Eur... 5.0 cheap
109828 passioni di pasta all'uovo ['Italian', 'Mediterranean', 'European', 'Stre... 5.0 cheap
109562 'o famo strano ['Fast Food', 'Mediterranean', 'Vegetarian Fri... 5.0 cheap
109950 pizzeria i gemelli roma ['Italian', 'Pizza', 'Vegetarian Friendly'] 5.0 cheap
110676 pizza & friends ['Italian', 'Pizza', 'Mediterranean', 'Vegetar... 5.0 cheap
110069 sasa sandwich and salad ['Italian', 'Vegetarian Friendly', 'Vegan Opti... 5.0 cheap

Content-Based Recommender

This is a system that recommends restaurants that are similar to others. More specifically, we will compute pairwise similarity scores for all restaurants based on their cuisine style and price range and recommend restaurants based on that similarity score.

In [35]:
# make a description column by adding the couisine style and the price range. 

# make cuisine style and price range columns strings
cols = ['Cuisine Style', 'Price Range']
for col in cols: 
    data[col] = data[col].astype(str)

new_col = []
# for each row of cuisine style, eliminate symbol characters 
for row in np.arange(data.shape[0]):
    c = data['Cuisine Style'][row].replace("[", "").replace(']', '')
    d = data['Price Range'][row]
    # attach price range to the string 
    e = c + ' ' + d 
    # append string to new list new_col
    new_col.append(e)
    
# add this column on dataset and name it description
data['description'] = new_col

The similarity measure between each pair of restaurants I will be using is cosine similarity.

Before declaring the fucntion that calculates similarity, it is necessary to:

  • reduce dataset to make it easier to compute
  • create a matrix with desctiptions. This matrix's columns are each word in the description of restaurants while each row represents a restaurant.
  • calculate the similarity between each pair of restaurants

Then I am going to declare a function that returns recommendations. This function works as follows:

  • the arguments that need to be inputed into the function are: the name of the resturant, the city in which you want the recommendations and the cosine similarity matrix.
  • the function gets the index of the resturant inputed.
  • gets the similarity score of other restaurants
  • ranks which ones are most similar
  • returns top 10 most similar restaurants in the city in question or in all cities.
In [36]:
# reduce dataset
m = data['Number of Reviews'].quantile(0.95)
CR_data = data.copy().loc[data['Number of Reviews'] >= m]
CR_data = CR_data.reset_index(drop=True)

# create matrix with descriptions
tfidf = TfidfVectorizer(stop_words='english')
CR_data['description']= CR_data['description'].fillna('')
tfidf_matrix = tfidf.fit_transform(CR_data['description'])

# calculate similarity 
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


def get_recommendations(name, city = 'all', cosine_sim=cosine_sim):
    city = city
    
    # reset indeces
    indices = pd.Series(CR_data.index, index=CR_data['Name']).drop_duplicates()
    
    # Get the index of the movie that matches the title
    idx = indices[name]

    # Get the pairwsie similarity scores of all restaurant with that restaurant
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the restaurants based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the restaurants indices
    res_indices = [i[0] for i in sim_scores]
    
    # get name, city and description of restaurant 
    sim_res = CR_data[['Name','City','description']].iloc[res_indices]
    
    # if city is set to all, 
    if city != 'all':
        # only show the ones from that city 
        r = sim_res.loc[sim_res['City'] == city, :].head(10)
    else:
        # else show all 
        r = CR_data[['Name','City','description']].iloc[res_indices].head(10)

    # Return the top 10 most similar restaurants
    return r
In [37]:
# ger recommendations for resturants similar to 
name = str(input('Insert the name of the restaurant (lower case): '))#'Restaurant Gordon Ramsay'
city = str(input('Insert the city (lower case): '))

print('If you liked ', name, 'then, in ', city, ' you could try:' )
get_recommendations(name, city=city)
Insert the name of the restaurant (lower case): restaurant gordon ramsay
Insert the city (lower case): milan
If you liked  restaurant gordon ramsay then, in  milan  you could try:
Out[37]:
Name City description
3221 cracco milan 'Italian', 'European', 'Contemporary', 'Vegeta...
3354 trussardi alla scala milan 'Italian', 'Mediterranean', 'European', 'Conte...
3270 al pont de ferr milan 'Italian', 'Mediterranean', 'Contemporary', 'V...
3060 da vic - ristorante guerrini milan 'Italian', 'Seafood', 'Mediterranean', 'Contem...
3161 asola | cucina sartoriale milan 'Italian', 'Seafood', 'Mediterranean', 'Europe...
3077 ristorante berton milan 'Italian', 'Seafood', 'Mediterranean', 'Europe...
3075 il luogo di aimo e nadia milan 'Italian', 'European', 'Vegetarian Friendly', ...
3131 tartufotto milan 'Italian', 'European', 'Vegetarian Friendly', ...
3271 bice milan 'Italian', 'Mediterranean', 'European', 'Veget...
3272 carlo e camilla in segheria milan 'Italian', 'Mediterranean', 'European', 'Veget...