Leveraging Collaborative Filtering for Movie Recommendations with TensorFlow

Paregi Aanchal
5 min readApr 8, 2024

--

Introduction:
Recommendation systems have become integral to modern digital experiences, shaping how users discover and consume content. Collaborative filtering stands out as a powerful technique within recommendation systems, leveraging past user interactions to make personalized recommendations. In this blog post, we embark on an exploration of collaborative filtering applied to movie recommendations using TensorFlow, unraveling the intricacies of its implementation and shedding light on its significance in the realm of personalized content discovery.

1. Notation:
In the realm of collaborative filtering, establishing a clear understanding of the notation employed is paramount for grasping the underlying algorithms and methodologies. We’ll delve into the notation utilized throughout our implementation, elucidating the roles of variables, matrices, and symbols involved in the collaborative filtering process.

Code:

# Sample notation explanation
# Define variables
num_users = 100
num_movies = 5000
num_features = 10
# Matrix representations
X = np.random.rand(num_movies, num_features)
W = np.random.rand(num_users, num_features)
b = np.random.rand(1, num_users)
Y = np.random.randint(0, 6, (num_movies, num_users))
R = np.random.randint(0, 2, (num_movies, num_users))

2. Recommender Systems:
Recommender systems serve as the backbone of personalized content delivery, aiding users in navigating vast collections of items to find those most aligned with their tastes. Collaborative filtering, a cornerstone of recommender systems, operates by analyzing user-item interactions to discern patterns and make informed recommendations. We’ll embark on a journey through the collaborative filtering learning algorithm, elucidating how user and item vectors are iteratively refined through collaborative learning.

Code:

# Sample collaborative filtering algorithm implementation
def collaborative_filtering(X, W, b, Y, R, lambda_):
# Implementation details
# Usage example
cost = collaborative_filtering(X, W, b, Y, R, lambda_)
print("Cost:", cost)

3. Movie Ratings Dataset:
Our journey into movie recommendations is underpinned by a meticulously curated dataset derived from the MovieLens “ml-latest-small” dataset. This dataset, meticulously tailored for our exploration, encapsulates ratings spanning a range from 0.5 to 5 in 0.5 increments and focuses on movies released post-2000. We’ll offer a detailed exposition on the dataset’s composition, elucidating the structure of matrices representing ratings and indicators crucial for our collaborative filtering endeavor.

Code:

# Sample movie ratings dataset loading
X, W, b, num_movies, num_features, num_users = load_movie_ratings_dataset()
print("Number of movies:", num_movies)
print("Number of users:", num_users)
print("Number of features:", num_features)

Output:

4. Collaborative Filtering Learning Algorithm:
At the heart of collaborative filtering lies a sophisticated learning algorithm tasked with deciphering user preferences and item characteristics to make accurate predictions. We’ll embark on a deep dive into the collaborative filtering cost function, dissecting its constituents and exploring both for loop and vectorized implementations. With code snippets and expected outputs in tow, we’ll unravel the intricacies of cost function computation and the role of regularization in refining our model.

Code:

# Sample collaborative filtering cost function implementation
def collaborative_filtering_cost(X, W, b, Y, R, lambda_)
# Usage example
cost = collaborative_filtering_cost(X, W, b, Y, R, lambda_)
print("Cost:", cost)

Output:

5. Learning Movie Recommendations:
Traversing the landscape of movie recommendations entails traversing the nuances of model training and personalized recommendation generation. We’ll guide readers through the iterative process of model training using TensorFlow’s GradientTape, providing insights into incorporating personal movie preferences to generate tailored recommendations. With code snippets illuminating the path, we’ll empower readers to navigate the intricacies of model training and recommendation generation with confidence.

Code:

iterations = 200
lambda_ = 1
for iter in range(iterations):
# Use TensorFlow's GradientTape
# to record the operations used to compute the cost
with tf.GradientTape() as tape:
# Compute the cost (forward pass included in cost)
cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss
grads = tape.gradient( cost_value, [X,W,b] )
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients( zip(grads, [X,W,b]) )
# Log periodically.
if iter % 20 == 0:
print(f"Training loss at iteration {iter}: {cost_value:0.1f}")
print("W:", W)
print("b:", b)

Output:

6. Recommendations:
Elevating the quality of recommendations hinges on leveraging additional insights gleaned from the dataset. We’ll delve into the nuances of recommendation enhancement, shedding light on how factors like average ratings and the prevalence of user interactions can inform recommendation relevance. A curated showcase of recommended movies, sorted by mean rating and number of ratings, will offer a tangible demonstration of recommendation refinement in action.


# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()
#restore the mean
pm = p + Ymean
my_predictions = pm[:,0]
# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')
for i in range(17):
j = ix[i]
if j not in my_rated:
print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')
print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
if my_ratings[i] > 0:
print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

Output:

Here, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a Pandas data frame which has many handy sorting features.

Code:

filter=(movieList_df["number of ratings"] > 20)
movieList_df["pred"] = my_predictions
movieList_df = movieList_df.reindex(columns=["pred", "mean rating", "number of ratings", "title"])
movieList_df.loc[ix[:300]].loc[filter].sort_values("mean rating", ascending=False)'

Output:

Conclusion:
Collaborative filtering stands as a beacon of innovation in the landscape of recommendation systems, empowering users to unearth hidden gems aligned with their tastes. In our exploration of collaborative filtering for movie recommendations using TensorFlow, we’ve unraveled the intricacies of algorithmic implementation and recommendation generation. Armed with a deeper understanding of collaborative filtering’s inner workings, readers are poised to embark on their journey of personalized content discovery, enriching digital experiences one recommendation at a time.

To provide access to the code referenced in this article, you can find it on GitHub. Feel free to explore the codebase and contribute if you find it useful. Check out the code on GitHub:

https://github.com/aanchalparegi/Collaborative-Filtering-for-Movie-Recommendations

By clicking the link above, you’ll be directed to the GitHub repository containing the complete code discussed in this article. Happy coding!

--

--