A real-time product recommender in Python (2023)

A real-time product recommender in Python (1)

How can we optimize our electronic shopping experience?

We can probably agree that those of us familiar with electronic shopping have experience shopping on Amazon. Over the years, it has undoubtedly become one of the largest electronic shopping centers. So when we search for something, we can trust Amazon's search algorithms to provide us with the best available options based on our search terms. Let's talk about how we can easily use a real-time product recommender based on Amazon results and optimize our shopping experience.

The motivation behind the idea.

The main idea is to see how robust and easy to use we can make our checkout process. Let's say we want to buy a pair of headphones. When we search Amazon, we get a list of almost 25-30 products. Now every buyer has some preferences. The most important of these preferences are brand specification and price. In addition, buyers look at other factors such as product popularity, product rating, product rating, and best price match. Also, as users, we sometimes feel that it would be great if we could quickly see what specs some products offer and make a decision from an expert perspective.

Now the needs are different from one buyer to another. Let's say some buyers focus on reviews, some on reviews, and some on price. Wouldn't it be great if we could give the user the ability to choose what to focus on? So let's see how we can achieve these goals.

The idea

The idea is to assign a score to each product based on the fields that the user can view during the purchase. For example, we give a product a rating based on its popularity, a rating based on reviews, etc. We then calculate a weighted score based on user preferences.

Let's see the idea with a concrete example. Let's say we want to buy a pair of headphones. So we search and get a list of 25 items. We assign each product a variable rating x1 based on ratings, x2 based on popularity, x3 based on ratings, and x4 based on price restrictions. Now we ask the user if they have any preferences, e.g. B. Focus more on one aspect than others. For example, if the user wants to focus more on reviews, we calculate the total score as follows:

y = x1+x2+ax3+x4

This gives x3 more weight. If the user does not have these preferences, we can calculate this.

y = x1+x2+x3+x4

Here we default each value to a maximum of 1 to ensure a balanced weighting of all factors. We can then sort the products based on their value and to get our results.

At the same time, we will create a specification list of all products to help our users with specific requirements to get a feel for which product is best suited for them.

let's apply

Application

If we search the Amazon website, our options will appear as shown below.

A real-time product recommender in Python (2)

We will rule out options offered by Amazon's search algorithm. We will do this with Selenium Webdriver.

A real-time product recommender in Python (3)

If we take a close look at the search bar and figure it out, we can easily frame our direct search URL that helps us reach the Amazon search page.

import right
hacer Selenium Import Webdriver
from selenium.webdriver.common.by import by
since selenium.webdriver.support.ui import WebDriverWait
In selenium.webdriver.support, import the expected conditions as EC
from import options selenium.webdriver.chrome.options
aus bs4 importado BeautifulSoup
import requests
def search_am(expression):
link="https://www.amazon.in/s?k="
l_end="&ref=nb_sb_noss"
frase_w=frase.replace(' ','+')
link_full=enlace+frase_w+l_end
#print(link_full)

driver = webdriver.Chrome()

wait = WebDriverWait(controller, 5)
motorista.get(link_full)

f_names=[]
nomes=driver.find_elements_by_tag_name("a")
yo=0
for names in names:
className = name.get_attribute('class')
if className=='a-link-normal a-text-normal':
nomes_f.append(Number)
i+=1

links=[]
for i in f_names:
temp= i.get_attribute('href')
enlaces.append (temporal)

driver.exit()
backlinks

The above function helps us to pull all the links for all the products listed on the search page and return the links.

Each link takes us to a specific product page.

A real-time product recommender in Python (4)

Next, let's focus on some parts of the product page.

  1. The classification area:
A real-time product recommender in Python (5)

This part mentions the classification of the product.

2. The popularity section: Here I used the number of reviews as a measure of popularity.

A real-time product recommender in Python (6)

3. The Prices Section: This section indicates the price of the product.

A real-time product recommender in Python (7)

4. The Specifications section – Lists all the specifications and details of the product.

A real-time product recommender in Python (8)

5. The rating section: This section reflects the ratings of the products.

A real-time product recommender in Python (9)

Now if we focus here in bold, there is a statement for each classification. This row provides a high-level overview of the analysis. We'll take those sentences, judge the humor, and assign the qualifying point.

First, let's extract the necessary details from the product pages.

def get_element_dets(enlace):
driver = webdriver.Chrome()
wait = WebDriverWait(controller, 2)
biker.get (link)
title_o= driver.find_elements_by_id("Producto")
title=title_o[0].text

numero_o= conductor.find_elements_by_id("acrCustomerReviewText")
attempt:
popularity=(number_o[0].text)
except:
Popularity='0'
rate=driver.find_elements_by_css_selector("#reviewsMedley > div > div.a-fixed-left-grid-col.a-col-left > div.a-section.a-spacing-none.a-spacing-top-mini. cr-widget-ACR > div.a-fixed-left-grid.AverageCustomerReviews.a-spacing-small > div > div.a-fixed-left-grid-col.aok-align-center.a-col-right > div > Spanne > Spanne")
attempt:
rate_o=(rate[0].text).split(' ')[0]
except:
tasa_o='0'
feat_f=[]
tag=[]
Wert=[]
#features=driver.find_elements_by_css_selector("#feature-bullets > ul > li > span")
# for f in functions:
# feat_f.append(f.texto)
Pres=0
attempt:
tag_o=driver.find_elements_by_tag_name('th')
for name in tag_o:
className = name.get_attribute('class')
if className=='a-color-secundario a-size-base prodDetSectionEntry':
label.append(name.text)

value_o=driver.find_elements_by_tag_name('td')
for name in value_o:
className = name.get_attribute('class')
if className=='a-size-base':
value.append(name.text)
yo=0
while i<len(valor):
t=str(Etiqueta[i])+':'+str(Wert[i])
feat_f.append(t)
i+=1
except:
feat_f=[':']
attempt:
price_o= driver.find_elements_by_id("priceblock_ourprice")
by name in price_o:
className = name.get_attribute('class')
if className=='a-size-medium a-color-price priceBlockBuyingPriceString':
price=(name.text)
romper
except:
Pres=0
#price=price_or.text

feedbacks=driver.find_elements_by_tag_name("a")

feedback_f=[]
for power feedbacks:
class name = feed.get_attribute('class')
if className=='a-size-base a-link-normal review-title a-color-base review-title-content a-text-negrita':
feedback_f.append(feed.text)

driver.exit()
back feedback_f,title,rate_o,popularity,performance_f,price

The above code snippet helps to clean up all the required details of the product pages and returns feedback on the product title, rate, popularity and other required values.

caller def (phrase):
links=search_am(expression)
data={}
print(len(links))
para link-in-links:
Data[link]={}
feedback_f,title,rate,popularity,feat_f,price=get_element_dets(link)
data[link]['feedback']=feedback_f
data[link]['title']=Title
data[link]['rate']=rate
data[link]['popularity']=Popularity
data[link]['feats']=feat_f
if it is instance(price, int):
data[link]['price']=Price

Anders:

data[link]['price']=price.split(' ')[1]
#print(length(data))
return data

The above snippet helps organize all the products and their corresponding features in a dictionary format.

A real-time product recommender in Python (10)

The key of any product is its link, and consequently all the resources in a nested dictionary as key-value pairs.

Amazon pages rarely have any variation in tags, but it's best to use Try and Exception blocks to handle errors just in case.

Now that we have discarded all the necessary data, we can start assigning scores.

System based on popularity and rating

def Assign_Popularity_Rating():
mit open('products.json', 'r') as open file:

data = json.load(open file)
temperature = 0
para k a data.keys():
p=int(data[k]['popularity'].split(' ')[0])
r=float(data[k]['rate'])
with p<50:
temperature = 1
elif p<100:
temperature = 2
elif p<150:
temperature = 3
Anders:
temperature = 4
score = (temperature)
data [ k ][ 'Popularity_Score' ]=Score
data[k]['Score_score']=r
mit open("products_mod.json", "w") als Outfile:
json.dump(data, output file)

The above code is used to assign each product a score based on a population and rating. For the population, I used a classification or class approach. We already have the valuations of the amounts we receive in the scraping.

Check out the mood-based system

from textblob import textblob
def Assign_Sentiment_Rating():
mit open('products_mod.json', 'r') as open file:

data = json.load(open file)

sm=0
para k a data.keys():
temp=data[k]['feedback']

z=0
sm=0
for me in temperature:
#print(me)
z+=1
t=TextBlob(i).feeling.polarity
#print(t)
sm+=t
con (z==0):
Rating = 0
Anders:

Score = sm/z
data[k]['Review_Score']=Rating
mit open("products_mod_2.json", "w") as output file:
json.dump(data, output file)

To detect the sentiment polarity, I used the sentiment polarity function from the TextBlob libraries. Assigns a value from -1 to +1 based on the sentiment detected in the ratings. We have multiple reviews of a product, so we have a value for each review. We then add up all the values ​​obtained from all the tests and divide by the number of tests to try to keep the total score less than or equal to -1. So we repeat the process for each product and get the rating for each product.

price relevance system

def check_price_relevence():
mit open('products_mod_2.json', 'r') as open file:

data = json.load(open file)
print("Enter the approximate price for the tuning search")
price=int(entry())
print("Specify a margin")
margin = int(input())

para k a data.keys():
data_ref=str(data[k]['price']).replace(',','')
temp=float(data_ref)

if temp<price+margin and temp>price-margin:
Rating = 1
Anders:
Rating = 0

data[k]['Price_relevence_Score']=Bewertung
with open("products_mod_3.json", "w") as output file:
json.dump(data, output file)

This is our price relevance function. Ask for an approximate price and a margin to compare. It then compares the price of the products and the assortment to assign the relevance score.

After assigning all the scores, our lexicon adds the following for each product.

A real-time product recommender in Python (11)

collect specs

We then create a CSV or Excel file with the specifications of all listed products for our customers with specific requirements.

import pandas as pd
def form_featureset():
mit open('products_mod_3.json', 'r') as open file:

data = json.load(open file)
feat=[]
set_c=[]
para k a data.keys():
temp=data[k]['resources']

temp2=[]

for me in temperature:
label=i.split(':')[0]
if the tag is not in the exploit:
feat.append(tag)
#print(feat)
para k a data.keys():
temp=data[k]['resources']
temp2=[-1]*len(feat)
for me in temperature:
label=i.split(':')[0]

#print(label)

ind= feat.index(label)
#print(indicate)

temp2[ind]= i.split(':')[1]

set_c.append(temp2)

df=pd.DataFrame(set_c,columns=feat)
df.to_csv('product_descriptions.csv',index=False)
return df

This snippet generates a data frame with all products and their specs listed to provide a view of the available specs.

A real-time product recommender in Python (12)
A real-time product recommender in Python (13)

The generated specification tables are as follows. I put -1 for values ​​that were not specified on the product pages.

These tables are intended to assist customers in their comparative search for specifications.

weighted score

def tune_search(choice):mit open('products_mod_3.json', 'r') as open file:

data = json.load(open file)
para k a data.keys():
precio_rel=data[k]['Price_relevence_Score']
review_score=datos[k]['Review_Score']
pop_score=datos[k]['Popularity_Score']
pop_score_k=pop_score/4

rate_score=datos[k]['Rating_Score']
rate_score_k=rate_score/5

if you choose == 1:
total_score=5*pop_score_k+rate_score_k+review_score+price_rel
if you choose == 2:
total_score=pop_score_k+5*rate_score_k+review_score+price_rel
if you choose == 3:
total_score=pop_score_k+rate_score_k+review_score+5*price_rel
if you choose == 4:
total_score=pop_score_k+rate_score_k+5*review_score+price_rel

Anders:
total_score=pop_score_k+rate_score_k+review_score+price_rel

data[k]['Total_score']=total_score
#print(data[k]['Total_score'])
links=sort_d(data)

backlinks

This code snippet returns a score based on the user's selection. I used a very basic conditional approach. I divided the scores by 5 and the population by 4 to keep the scores between 0 and 1. Here the weight value is set to 5. It's just a random selection.

This is our code here.

Application

The provided video demonstrates the application.

Whatever the choice, I have also provided the links to other options just to give the user more convenience and the ability to try other options.

In my case, you see the chrome window, even though it is automated and closes itself, it still appears, you can prevent it from starting headless chrome and use chromeoptions() in the chrome controller.

I'm looking forward to

The application can be modified or made more robust in two ways, but both require data that is not currently available (that I know of).

  1. Once Amazon's current sentiment data sets are available, we can create our own sentiment classifier, where we can place other classes along with positive and negative sentiment. This will help make our review-based rating stronger.
  2. If the data or spec is categorically in place, we can create our own built-in resource for the spec. For example, if we have enough instances of laptop data, we can create an embedding or coding room for laptops. We can represent any laptop as an embedded vector of its specifications. We can create a new specification vector according to user requirements. There we can apply the K-Nearest Neighbors algorithm to obtain the closest K-vectors to the requirements vector and order them according to their Euclidean distances. Thus, we can obtain k laptops with specifications close to the user's needs. This allows us to add relevance to spec-based scoring to make our system more robust.

Diploma

See how we can build a real-time product recommendation engine in Python in just a few steps.

Here it isGithub link.

I hope that helps.

Top Articles
Latest Posts
Article information

Author: Corie Satterfield

Last Updated: 04/21/2023

Views: 5667

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Corie Satterfield

Birthday: 1992-08-19

Address: 850 Benjamin Bridge, Dickinsonchester, CO 68572-0542

Phone: +26813599986666

Job: Sales Manager

Hobby: Table tennis, Soapmaking, Flower arranging, amateur radio, Rock climbing, scrapbook, Horseback riding

Introduction: My name is Corie Satterfield, I am a fancy, perfect, spotless, quaint, fantastic, funny, lucky person who loves writing and wants to share my knowledge and understanding with you.