Recommendation System¶

Modeled on TripAdvisor data¶

Reccommendation Systems aim at finding items that users might be interested in given a set of characteristics. Recommendation systems are generally used on online stores and websites such as Netflix.(Mitra et al. 2016). The process of creating personalised recommendation for users is described in detail by Adomavicius & Tuzhilin (2005).

Leskovec et al. (2014) state that there are two main architectures for recommendation systems. First are content-based systems. These focus on the characteristics of the items. On a content base system, users are recommended items that are similar to the ones that they have already consumed. Second, are collaborative filtering systems which focus on the relationship between customers and items. Ansari et al. (1999) describes collaborative filtering as an algorithm that mimics word-of-mouth communication because the algorithm suggests customers, items that people similar to them have purchased.

This notebook shows how to create a simple recommendation system using trip advisor data. The aim is therefore to create restaurant recommendations. I first created a simple system that ranks all restaurants and returns the top rated. Second I created a system that gives recommendations based on a particular restaurant. For example, if you feed the alghorithm the name of a restaurant, it will return a list of similar ones to it.

# import packages
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn import model_selection
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from wordcloud import WordCloud
from math import log, sqrt
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel



import warnings


import nltk
nltk.download('punkt')
warnings.filterwarnings('ignore')

%matplotlib inline

[nltk_data] Downloading package punkt to /Users/gioia/nltk_data...
[nltk_data]   Package punkt is already up-to-date!

# read data 
TAdata = pd.read_csv('./TA_restaurants_curated.csv')

# copy data 
data = TAdata

# look at data
data.head()

data.shape

(125527, 11)

Pre-Processing¶

# eliminate columns we are not using
data = data.iloc[:,[1,2,3,4,5,6,7,8,10]]

# replace price range values with 'cheap', 'medium' and 'expensive' 
data['Price Range'].replace(['$', '$$ - $$$', '$$$$'], ['cheap', 'medium', 'expensive'], inplace=True)

# make city and name of restaurant lowercase 
data['City'] = data['City'].str.lower()
data['Name'] = data['Name'].str.lower()

1. Simple Reccommender¶

Simple recommenders are basic systems that recommends the top items based on a certain metric or score.

The following are the steps involved:

Decide on the metric or score to rate restaurants on.
Calculate the score for every restaurant.
Sort the restaurants based on the score and output the top results.

Step 1: Weight the ratings based on the numbers of reviews¶

$$ WR = \frac{v}{v+m}R + \frac{m}{v+m}C $$ where:

$v$ = number of reviews per restaurant
$m$ = mimimum number of reviews required to be listed in the chart
$R$ = average review of the restaurant
$C$ = mean review across all restaurants

# calculate C first
C = data['Rating'].mean()
print('The mean review across all restaurants is ', str(C)[0:5])

The mean review across all restaurants is  3.987

# caclulate m 
# What is the minimum number of review a restaurant need to have to be included in this chart
m = data['Number of Reviews'].quantile(0.50)
print('The minimum number of reviews required to be listed in the chart is',m)

The minimum number of reviews required to be listed in the chart is 32.0

# get restaurants that have at leat m reviews
SR_data = data.copy().loc[data['Number of Reviews'] >= m]
print(str(SR_data.shape[0]) + ' restaurants can be included in the chart')

54757 restaurants can be included in the chart

# create a function that calculate the weighted review for each restaurant
def weighted_review(x, m=m, C=C):
    # v is the number of reveiws of a particular restaurant
    v = x['Number of Reviews']
    # R is the average rating 
    R = x['Rating']
    # weighted rating
    WR = (v/(v+m) * R) + (m/(m+v) * C)
    # return weighted rating
    return WR

# create a new column of dataframe called 'score' where to store this value 
SR_data['score'] = SR_data.apply(weighted_review, axis=1)

# filter restuarants based on city and price range and then tell me the best 15 according to my score

# input city you want to select 
city = str(input('Insert City (lower case please): '))
# input price range 
price_range = str(input('Insert Price Range: "cheap", "medium", "expensive" or "all" '))

# if the price range is 'all'
if price_range == 'all':
    # only filter the city 
    city_data = SR_data.loc[SR_data['City'] == city,:]
else:
    # otherwise filter the city and price range
    city_data = SR_data.loc[(SR_data['City'] == city) & (SR_data['Price Range'] == price_range),:]

# sort restaurant by score 
city_data = city_data.sort_values('score', ascending=False)
# show top 10 rated resturant in that city and price range 
city_data[['Name', 'Cuisine Style', 'Rating', 'Price Range']].head(10)

Insert City (lower case please): rome
Insert Price Range: "cheap", "medium", "expensive" or "all" cheap

Content-Based Recommender¶

This is a system that recommends restaurants that are similar to others. More specifically, we will compute pairwise similarity scores for all restaurants based on their cuisine style and price range and recommend restaurants based on that similarity score.

# make a description column by adding the couisine style and the price range. 

# make cuisine style and price range columns strings
cols = ['Cuisine Style', 'Price Range']
for col in cols: 
    data[col] = data[col].astype(str)

new_col = []
# for each row of cuisine style, eliminate symbol characters 
for row in np.arange(data.shape[0]):
    c = data['Cuisine Style'][row].replace("[", "").replace(']', '')
    d = data['Price Range'][row]
    # attach price range to the string 
    e = c + ' ' + d 
    # append string to new list new_col
    new_col.append(e)
    
# add this column on dataset and name it description
data['description'] = new_col

The similarity measure between each pair of restaurants I will be using is cosine similarity.

Before declaring the fucntion that calculates similarity, it is necessary to:

reduce dataset to make it easier to compute
create a matrix with desctiptions. This matrix's columns are each word in the description of restaurants while each row represents a restaurant.
calculate the similarity between each pair of restaurants

Then I am going to declare a function that returns recommendations. This function works as follows:

the arguments that need to be inputed into the function are: the name of the resturant, the city in which you want the recommendations and the cosine similarity matrix.
the function gets the index of the resturant inputed.
gets the similarity score of other restaurants
ranks which ones are most similar
returns top 10 most similar restaurants in the city in question or in all cities.

# reduce dataset
m = data['Number of Reviews'].quantile(0.95)
CR_data = data.copy().loc[data['Number of Reviews'] >= m]
CR_data = CR_data.reset_index(drop=True)

# create matrix with descriptions
tfidf = TfidfVectorizer(stop_words='english')
CR_data['description']= CR_data['description'].fillna('')
tfidf_matrix = tfidf.fit_transform(CR_data['description'])

# calculate similarity 
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


def get_recommendations(name, city = 'all', cosine_sim=cosine_sim):
    city = city
    
    # reset indeces
    indices = pd.Series(CR_data.index, index=CR_data['Name']).drop_duplicates()
    
    # Get the index of the movie that matches the title
    idx = indices[name]

    # Get the pairwsie similarity scores of all restaurant with that restaurant
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the restaurants based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the restaurants indices
    res_indices = [i[0] for i in sim_scores]
    
    # get name, city and description of restaurant 
    sim_res = CR_data[['Name','City','description']].iloc[res_indices]
    
    # if city is set to all, 
    if city != 'all':
        # only show the ones from that city 
        r = sim_res.loc[sim_res['City'] == city, :].head(10)
    else:
        # else show all 
        r = CR_data[['Name','City','description']].iloc[res_indices].head(10)

    # Return the top 10 most similar restaurants
    return r

# ger recommendations for resturants similar to 
name = str(input('Insert the name of the restaurant (lower case): '))#'Restaurant Gordon Ramsay'
city = str(input('Insert the city (lower case): '))

print('If you liked ', name, 'then, in ', city, ' you could try:' )
get_recommendations(name, city=city)

Insert the name of the restaurant (lower case): restaurant gordon ramsay
Insert the city (lower case): milan
If you liked  restaurant gordon ramsay then, in  milan  you could try:

	Unnamed: 0	Name	City	Cuisine Style	Ranking	Rating	Price Range	Number of Reviews	Reviews	URL_TA	ID_TA
0	0	Martine of Martine's Table	Amsterdam	['French', 'Dutch', 'European']	1.0	5.0	$$ - $$$	136.0	[['Just like home', 'A Warm Welcome to Wintry ...	/Restaurant_Review-g188590-d11752080-Reviews-M...	d11752080
1	1	De Silveren Spiegel	Amsterdam	['Dutch', 'European', 'Vegetarian Friendly', '...	2.0	4.5	$$$$	812.0	[['Great food and staff', 'just perfect'], ['0...	/Restaurant_Review-g188590-d693419-Reviews-De_...	d693419
2	2	La Rive	Amsterdam	['Mediterranean', 'French', 'International', '...	3.0	4.5	$$$$	567.0	[['Satisfaction', 'Delicious old school restau...	/Restaurant_Review-g188590-d696959-Reviews-La_...	d696959
3	3	Vinkeles	Amsterdam	['French', 'European', 'International', 'Conte...	4.0	5.0	$$$$	564.0	[['True five star dinner', 'A superb evening o...	/Restaurant_Review-g188590-d1239229-Reviews-Vi...	d1239229
4	4	Librije's Zusje Amsterdam	Amsterdam	['Dutch', 'European', 'International', 'Vegeta...	5.0	4.5	$$$$	316.0	[['Best meal.... EVER', 'super food experience...	/Restaurant_Review-g188590-d6864170-Reviews-Li...	d6864170

	Name	Cuisine Style	Rating	Price Range
109131	pane e salame	['Italian', 'Street Food', 'Vegetarian Friendly']	5.0	cheap
109150	pizza zizza caffetteria birreria desserteria	['Italian', 'Pizza', 'Fast Food', 'Vegetarian ...	5.0	cheap
109132	two sizes	['Fast Food', 'Italian', 'Cafe', 'Vegetarian F...	5.0	cheap
109133	bread-in	['Italian', 'Fast Food', 'Mediterranean', 'Eur...	5.0	cheap
109143	l'uliveto shop	['Italian', 'Fast Food', 'Mediterranean', 'Eur...	5.0	cheap
109828	passioni di pasta all'uovo	['Italian', 'Mediterranean', 'European', 'Stre...	5.0	cheap
109562	'o famo strano	['Fast Food', 'Mediterranean', 'Vegetarian Fri...	5.0	cheap
109950	pizzeria i gemelli roma	['Italian', 'Pizza', 'Vegetarian Friendly']	5.0	cheap
110676	pizza & friends	['Italian', 'Pizza', 'Mediterranean', 'Vegetar...	5.0	cheap
110069	sasa sandwich and salad	['Italian', 'Vegetarian Friendly', 'Vegan Opti...	5.0	cheap

	Name	City	description
3221	cracco	milan	'Italian', 'European', 'Contemporary', 'Vegeta...
3354	trussardi alla scala	milan	'Italian', 'Mediterranean', 'European', 'Conte...
3270	al pont de ferr	milan	'Italian', 'Mediterranean', 'Contemporary', 'V...
3060	da vic - ristorante guerrini	milan	'Italian', 'Seafood', 'Mediterranean', 'Contem...
3161	asola \| cucina sartoriale	milan	'Italian', 'Seafood', 'Mediterranean', 'Europe...
3077	ristorante berton	milan	'Italian', 'Seafood', 'Mediterranean', 'Europe...
3075	il luogo di aimo e nadia	milan	'Italian', 'European', 'Vegetarian Friendly', ...
3131	tartufotto	milan	'Italian', 'European', 'Vegetarian Friendly', ...
3271	bice	milan	'Italian', 'Mediterranean', 'European', 'Veget...
3272	carlo e camilla in segheria	milan	'Italian', 'Mediterranean', 'European', 'Veget...