Machine Learning in E-Commerce: Recommendation Engines Explained
Amazon attributes 35% of its revenue to its recommendation engine. Here's how recommendation systems work, which ML approach to use, and how to build one for your e-commerce business.
TL;DR
- Recommendation engines use ML to predict which products a specific user is most likely to buy next
- Three main approaches: collaborative filtering (behaviour-based), content-based filtering (attribute-based), and hybrid (both)
- Collaborative filtering requires ~1,000+ users with history; content-based works with smaller datasets
- Average revenue uplift: 8–25% higher AOV; 15–40% better conversion on recommended products
- You don't need to be Amazon — open-source libraries (Surprise, LightFM, implicit) make this accessible to any e-commerce business
Why Recommendation Engines Matter
Showing every customer the same product catalogue is the same as every customer walking into a store where nothing is organised specifically for them. A recommendation engine turns your product catalogue into a personalised storefront for every visitor.
35%
of Amazon's revenue from recommendations
75%
of Netflix views come from recommendations
15–25%
typical AOV uplift for e-commerce
The Three Recommendation Approaches
1. Collaborative Filtering
Collaborative filtering finds patterns across users: "people who bought what you bought also bought this." It doesn't need to understand anything about the products — it only looks at user behaviour patterns.
There are two types:
- User-user collaborative filtering: Finds users similar to you and recommends what they liked. "Users like you also bought..." — computationally expensive at scale.
- Item-item collaborative filtering: Finds items that are frequently bought together. "Customers who bought this also bought..." — more scalable; used by Amazon.
Python — Item-Item Collaborative Filtering with Implicit
import implicit
import scipy.sparse as sparse
# Build user-item interaction matrix (purchases)
# rows = users, columns = items, values = purchase count
user_item_matrix = sparse.csr_matrix(purchase_df.pivot(
index='user_id', columns='product_id', values='purchase_count'
).fillna(0))
# Train ALS model
model = implicit.als.AlternatingLeastSquares(factors=50, iterations=30)
model.fit(user_item_matrix.T) # Pass item-user matrix
# Get recommendations for a specific user
user_id = 42
recommendations = model.recommend(user_id, user_item_matrix)
# Returns: [(product_id, score),...]
Cold start problem: Collaborative filtering fails for new users (no history) and new products (no interactions). This is its primary limitation.
2. Content-Based Filtering
Content-based filtering recommends items similar to what a user has engaged with, based on product attributes: category, brand, price range, description keywords, colour, size.
Advantages: works immediately for new users and new products; explainable ("we recommended this because you bought X which is similar"). Disadvantage: limited serendipity — it never recommends something genuinely different.
Content-based similarity using TF-IDF + cosine similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Combine product features into a single text representation
products['features'] = (
products['category'] + ' ' +
products['brand'] + ' ' +
products['description']
)
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(products['features'])
# Compute similarity matrix
similarity_matrix = cosine_similarity(tfidf_matrix)
def get_similar_products(product_id, n=5):
idx = products.index[products['id'] == product_id][0]
scores = list(enumerate(similarity_matrix[idx]))
scores = sorted(scores, key=lambda x: x[1], reverse=True)[1:n+1]
return [products.iloc[i[0]]['id'] for i in scores]
3. Hybrid Approaches
Most production recommendation systems are hybrid — combining collaborative and content-based signals. For a new user with no history, use content-based; as history accumulates, weight collaborative filtering more heavily. LightFM is a popular library that implements hybrid collaborative + content-based filtering in a single model.
Recommendation Placement Strategy
| Placement | Best Algorithm | Goal | Typical CTR |
|---|---|---|---|
| Homepage | Personalised (collaborative) | Discovery / re-engagement | 3–8% |
| Product page | Similar items (content-based) | Cross-sell / browse | 5–12% |
| Cart page | Frequently bought together | Upsell / AOV increase | 8–18% |
| Post-purchase email | Complementary items | Repeat purchase | 2–6% |
| Search results | Personalised ranking | Conversion lift | 10–25% |
Comparing Recommendation Approaches
| Factor | Collaborative | Content-Based | Hybrid |
|---|---|---|---|
| Minimum data needed | 1,000+ users | 50+ products | 500+ users + product attributes |
| Cold start (new users) | Poor | Good | Good |
| Cold start (new products) | Poor | Excellent | Good |
| Serendipity (discovery) | High | Low | Medium–High |
| Explainability | Low ("users like you") | High ("similar to X") | Medium |
| Complexity to build | Medium | Low | High |
Frequently Asked Questions
How much data do I need to build a recommendation engine?
For collaborative filtering, you need at least 1,000 users with meaningful interaction history. Content-based filtering can work with far less — even 100 products and 50 users — because it relies on product attributes rather than behaviour patterns. Most e-commerce businesses with 6+ months of transaction data have enough to start.
What is the difference between collaborative and content-based filtering?
Collaborative filtering finds patterns in user behaviour — "users who bought X also bought Y" — without knowing anything about the products. Content-based filtering recommends items similar to what a user has engaged with based on product attributes. Collaborative filtering is more powerful at scale but suffers from the cold start problem with new users or products.
How much revenue uplift can a recommendation engine drive?
Amazon attributes approximately 35% of its revenue to recommendations. For smaller e-commerce businesses, typical uplifts are 8–25% increase in average order value and 15–40% improvement in conversion rate for recommended products. Results depend on data quality, recommendation placement, and model tuning.
Want a Custom Recommendation Engine for Your E-Commerce?
We build custom ML recommendation systems integrated with your product catalogue and customer data. Book a free consultation to explore what's possible for your store.
Book a Free ML Consultation