Machine Learning in E-Commerce: Recommendation Engines Explained

Amazon attributes 35% of its revenue to its recommendation engine. Here's how recommendation systems work, which ML approach to use, and how to build one for your e-commerce business.

By SpiderHunts Technologies  ·  23 May 2026  ·  12 min read

TL;DR

  • Recommendation engines use ML to predict which products a specific user is most likely to buy next
  • Three main approaches: collaborative filtering (behaviour-based), content-based filtering (attribute-based), and hybrid (both)
  • Collaborative filtering requires ~1,000+ users with history; content-based works with smaller datasets
  • Average revenue uplift: 8–25% higher AOV; 15–40% better conversion on recommended products
  • You don't need to be Amazon — open-source libraries (Surprise, LightFM, implicit) make this accessible to any e-commerce business

Why Recommendation Engines Matter

Showing every customer the same product catalogue is the same as every customer walking into a store where nothing is organised specifically for them. A recommendation engine turns your product catalogue into a personalised storefront for every visitor.

35%

of Amazon's revenue from recommendations

75%

of Netflix views come from recommendations

15–25%

typical AOV uplift for e-commerce

The Three Recommendation Approaches

1. Collaborative Filtering

Collaborative filtering finds patterns across users: "people who bought what you bought also bought this." It doesn't need to understand anything about the products — it only looks at user behaviour patterns.

There are two types:

  • User-user collaborative filtering: Finds users similar to you and recommends what they liked. "Users like you also bought..." — computationally expensive at scale.
  • Item-item collaborative filtering: Finds items that are frequently bought together. "Customers who bought this also bought..." — more scalable; used by Amazon.

Python — Item-Item Collaborative Filtering with Implicit

import implicit
import scipy.sparse as sparse

# Build user-item interaction matrix (purchases)
# rows = users, columns = items, values = purchase count
user_item_matrix = sparse.csr_matrix(purchase_df.pivot(
 index='user_id', columns='product_id', values='purchase_count'
).fillna(0))

# Train ALS model
model = implicit.als.AlternatingLeastSquares(factors=50, iterations=30)
model.fit(user_item_matrix.T) # Pass item-user matrix

# Get recommendations for a specific user
user_id = 42
recommendations = model.recommend(user_id, user_item_matrix)
# Returns: [(product_id, score),...]

Cold start problem: Collaborative filtering fails for new users (no history) and new products (no interactions). This is its primary limitation.

2. Content-Based Filtering

Content-based filtering recommends items similar to what a user has engaged with, based on product attributes: category, brand, price range, description keywords, colour, size.

Advantages: works immediately for new users and new products; explainable ("we recommended this because you bought X which is similar"). Disadvantage: limited serendipity — it never recommends something genuinely different.

Content-based similarity using TF-IDF + cosine similarity

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Combine product features into a single text representation
products['features'] = (
 products['category'] + ' ' +
 products['brand'] + ' ' +
 products['description']
)

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(products['features'])

# Compute similarity matrix
similarity_matrix = cosine_similarity(tfidf_matrix)

def get_similar_products(product_id, n=5):
 idx = products.index[products['id'] == product_id][0]
 scores = list(enumerate(similarity_matrix[idx]))
 scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:n+1]
 return [products.iloc[i[0]]['id'] for i in scores]

3. Hybrid Approaches

Most production recommendation systems are hybrid — combining collaborative and content-based signals. For a new user with no history, use content-based; as history accumulates, weight collaborative filtering more heavily. LightFM is a popular library that implements hybrid collaborative + content-based filtering in a single model.

Recommendation Placement Strategy

Placement Best Algorithm Goal Typical CTR
Homepage Personalised (collaborative) Discovery / re-engagement 3–8%
Product page Similar items (content-based) Cross-sell / browse 5–12%
Cart page Frequently bought together Upsell / AOV increase 8–18%
Post-purchase email Complementary items Repeat purchase 2–6%
Search results Personalised ranking Conversion lift 10–25%

Comparing Recommendation Approaches

Factor Collaborative Content-Based Hybrid
Minimum data needed 1,000+ users 50+ products 500+ users + product attributes
Cold start (new users) Poor Good Good
Cold start (new products) Poor Excellent Good
Serendipity (discovery) High Low Medium–High
Explainability Low ("users like you") High ("similar to X") Medium
Complexity to build Medium Low High

Frequently Asked Questions

How much data do I need to build a recommendation engine?

For collaborative filtering, you need at least 1,000 users with meaningful interaction history. Content-based filtering can work with far less — even 100 products and 50 users — because it relies on product attributes rather than behaviour patterns. Most e-commerce businesses with 6+ months of transaction data have enough to start.

What is the difference between collaborative and content-based filtering?

Collaborative filtering finds patterns in user behaviour — "users who bought X also bought Y" — without knowing anything about the products. Content-based filtering recommends items similar to what a user has engaged with based on product attributes. Collaborative filtering is more powerful at scale but suffers from the cold start problem with new users or products.

How much revenue uplift can a recommendation engine drive?

Amazon attributes approximately 35% of its revenue to recommendations. For smaller e-commerce businesses, typical uplifts are 8–25% increase in average order value and 15–40% improvement in conversion rate for recommended products. Results depend on data quality, recommendation placement, and model tuning.

Want a Custom Recommendation Engine for Your E-Commerce?

We build custom ML recommendation systems integrated with your product catalogue and customer data. Book a free consultation to explore what's possible for your store.

Book a Free ML Consultation