Machine Learning in E-Commerce: Recommendation Engines Explained

Q: How much data do I need to build a recommendation engine?

For collaborative filtering to work well, you need at least 1,000 users with meaningful interaction history (purchases, clicks, ratings). Content-based filtering can work with far less — even 100 products and 50 users — because it relies on product attributes rather than behaviour patterns. Most e-commerce businesses with 6+ months of transaction data have enough to start.

Q: What is the difference between collaborative filtering and content-based filtering?

Collaborative filtering finds patterns in user behaviour — 'users who bought X also bought Y' — without knowing anything about the products themselves. Content-based filtering recommends items similar to what a user has engaged with based on product attributes (category, price, brand, description). Collaborative filtering is more powerful at scale but suffers from the cold start problem with new users or products. Content-based filtering works immediately for new products.

Q: How much revenue uplift can a recommendation engine drive?

Amazon attributes approximately 35% of its revenue to its recommendation engine. For smaller e-commerce businesses, typical uplifts from adding a custom recommendation engine are 8–25% increase in average order value and 15–40% improvement in conversion rate for recommended products. Results depend heavily on data quality, recommendation placement, and how well the model is tuned to your customer base.

Last updated: 2026-05-23

Amazon attributes 35% of its revenue to its recommendation engine. Here's how recommendation systems work, which ML approach to use, and how to build one for your e-commerce business.

By SpiderHunts Technologies · 23 May 2026 · 12 min read

TL;DR

Recommendation engines use ML to predict which products a specific user is most likely to buy next
Three main approaches: collaborative filtering (behaviour-based), content-based filtering (attribute-based), and hybrid (both)
Collaborative filtering requires ~1,000+ users with history; content-based works with smaller datasets
Average revenue uplift: 8–25% higher AOV; 15–40% better conversion on recommended products
You don't need to be Amazon — open-source libraries (Surprise, LightFM, implicit) make this accessible to any e-commerce business

Why Recommendation Engines Matter

Showing every customer the same product catalogue is a mistake. It is the same as every customer walking into a store where nothing is organised specifically for them. A recommendation engine turns your product catalogue into a personalised storefront for every visitor.

35%

of Amazon's revenue from recommendations

75%

of Netflix views come from recommendations

15–25%

typical AOV uplift for e-commerce

The Three Recommendation Approaches

1. Collaborative Filtering

Collaborative filtering finds patterns across users: "people who bought what you bought also bought this." It doesn't need to understand anything about the products — it only looks at user behaviour patterns.

There are two types:

User-user collaborative filtering: Finds users similar to you and recommends what they liked. "Users like you also bought..." — computationally expensive at scale.
Item-item collaborative filtering: Finds items that are frequently bought together. "Customers who bought this also bought..." — more scalable; used by Amazon.

Python — Item-Item Collaborative Filtering with Implicit

import implicit
import scipy.sparse as sparse

# Build user-item interaction matrix (purchases)
# rows = users, columns = items, values = purchase count
user_item_matrix = sparse.csr_matrix(purchase_df.pivot(
 index='user_id', columns='product_id', values='purchase_count'
).fillna(0))

# Train ALS model
model = implicit.als.AlternatingLeastSquares(factors=50, iterations=30)
model.fit(user_item_matrix.T) # Pass item-user matrix

# Get recommendations for a specific user
user_id = 42
recommendations = model.recommend(user_id, user_item_matrix)
# Returns: [(product_id, score),...]

Cold start problem: Collaborative filtering fails for new users (no history) and new products (no interactions). This is its primary limitation.

2. Content-Based Filtering

Content-based filtering recommends items similar to what a user has engaged with, based on product attributes: category, brand, price range, description keywords, colour, size.

Advantages: works immediately for new users and new products; explainable ("we recommended this because you bought X which is similar"). Disadvantage: limited serendipity — it never recommends something genuinely different.

Content-based similarity using TF-IDF + cosine similarity

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Combine product features into a single text representation
products['features'] = (
 products['category'] + ' ' +
 products['brand'] + ' ' +
 products['description']
)

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(products['features'])

# Compute similarity matrix
similarity_matrix = cosine_similarity(tfidf_matrix)

def get_similar_products(product_id, n=5):
 idx = products.index[products['id'] == product_id][0]
 scores = list(enumerate(similarity_matrix[idx]))
 scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:n+1]
 return [products.iloc[i[0]]['id'] for i in scores]

3. Hybrid Approaches

Most production recommendation systems are hybrid — combining collaborative and content-based signals. For a new user with no history, use content-based. As history accumulates, weight collaborative filtering more heavily. LightFM is a popular library that implements hybrid collaborative + content-based filtering in a single model.

Recommendation Placement Strategy

Placement	Best Algorithm	Goal	Typical CTR
Homepage	Personalised (collaborative)	Discovery / re-engagement	3–8%
Product page	Similar items (content-based)	Cross-sell / browse	5–12%
Cart page	Frequently bought together	Upsell / AOV increase	8–18%
Post-purchase email	Complementary items	Repeat purchase	2–6%
Search results	Personalised ranking	Conversion lift	10–25%

Comparing Recommendation Approaches

Factor	Collaborative	Content-Based	Hybrid
Minimum data needed	1,000+ users	50+ products	500+ users + product attributes
Cold start (new users)	Poor	Good	Good
Cold start (new products)	Poor	Excellent	Good
Serendipity (discovery)	High	Low	Medium–High
Explainability	Low ("users like you")	High ("similar to X")	Medium
Complexity to build	Medium	Low	High

Frequently Asked Questions

How much data do I need to build a recommendation engine?

For collaborative filtering, you need at least 1,000 users with meaningful interaction history. Content-based filtering can work with far less — even 100 products and 50 users — because it relies on product attributes rather than behaviour patterns. Most e-commerce businesses with 6+ months of transaction data have enough to start.

What is the difference between collaborative and content-based filtering?

Collaborative filtering finds patterns in user behaviour — "users who bought X also bought Y" — without knowing anything about the products. Content-based filtering recommends items similar to what a user has engaged with based on product attributes. Collaborative filtering is more powerful at scale but suffers from the cold start problem with new users or products.

How much revenue uplift can a recommendation engine drive?

Amazon attributes approximately 35% of its revenue to recommendations. For smaller e-commerce businesses, typical uplifts are 8–25% increase in average order value and 15–40% improvement in conversion rate for recommended products. Results depend on data quality, recommendation placement, and model tuning.

Want a Custom Recommendation Engine for Your E-Commerce?

We build custom ML recommendation systems integrated with your product catalogue and customer data. Book a free consultation to explore what's possible for your store.

Book a Free ML Consultation

Machine Learning Machine Learning vs AI: What's the Difference and Why It Machine Learning Supervised vs Unsupervised vs Reinforcement Learning Machine Learning How to Build a Custom Machine Learning Model for Your Business

🤖 More in AI & Machine Learning