Book Recommendation System
Book Recommendation System
This project utilizes unsupervised machine learning to develop a simple but robust book recommendation system that can deliver
personalized book recommendations. A recommendation system identifies the preferences of a given user and offers relevant suggestions or
related content in return. For this recommendation system, the recommender would take input from the user with the name of a given book
and delivers highly tailored book recommendations in return. It leverages both content-based and genre-based similarities in providing the
final recommendations. Having been trained on a large dataset of books (taken from Goodreads books database) comprised of many
different books, authors, genres, reviews, plot summaries and descriptions, it identifies similarities between the input book (given by the
user) and other books in the database across all these different dimensions, selects and returns the most similar or most relevant ones. This
book recommendation system can also filter, preprocess, and parse text to enable better matching and comparison. It also ensures author
variety and can also be easily customized to increase or decrease the number of relevant recommendations or to control the degree to which
the recommendations should be content-based or genre-based or a mixture of both. All this, and more, ultimately culminates into a
powerful book recommender system that can be used to search for and explore new books based on one's prior preferences and book
favorites.
The dataset presented here was taken from Kaggle, which you can access easily by clicking here. This dataset consists of thousands of books
collected from Goodreads, a popular platform for discovering, reviewing, and discussing books. Indeed, it provides a comprehensive book
collection of more than 16,000 books in total, covering a myriad of different authors, genres, and literary eras, ancient and modern. It covers
all the major literary works from the ancient times and up to May 2024. Each book featured, represented by a data row, covers important
details and descriptions about it, including the book title, author, genre classification, publication date, format, and its average rating score.
As such, the data here can support a variety of purposes, from data analysis to studying user-preferences and performing sentiment analysis
to building recommendation systems, as with the current case. This dataset has been licensed by MIT for free use for commercial and noncommercial purposes.
You can view each column and its description in the table below:
Variable
Description
book_id
Unique identifier for each book in the data
cover_image_uri
URI or URL pointing to the cover image of the book
book_title
Title of the book
book_details
Details about the book, including summary, plot, synopsis or other descriptive information
format
Details about the format of the book such as whether it's a hardcover, paperback, or audiobook
publication_info
Information about the publication of the book including the publisher, publication date, or any other relevant details
authorlink
URI or URL pointing to more information about the author (if available)
author
Name of the book author(s)
num_pages
Number of pages
genres
Genre labels applying to the book
num_ratings
Total number of ratings
num_reviews
Total number of reviews
average_rating
Overall average rating score
rating_distribution
Number of ratings per rating star (for a 5-point rating system)
To develop the book recommendation system, the dataset is first inspected, cleaned, filtered, and updated in preparation for analysis and
model development. After having prepared and analyzed the data, a Term Frequency - Inverse Document Frequency (TF-IDF) vectorizer
model is then employed for text vectorization and processing, converting books' important attributes (including author, genres, and plot
summary or book description) into numeric vectors with TF-IDF scores capturing and representing each book and how it compares to all
others in the dataset. These TF-IDF scores are then compared using cosine distance similarity to measure and map out the overall similarities
between the different books, returning a large data matrix with the overall similarities between books. In addition, a separate data matrix is
developed for book genres alone to identify and map out the exact genre similarities between the books (using jaccard distance similarity).
With the analysis and modeling coming to completion, a book recommendation function is then developed to utilize the similarity matrices
obtained in order to deliver tailored book recommendations. As mentioned, this function also features different options to control the
nature of the book recommendations such as whether to recommend by genre in particular or by overall similarity more generally and how
many books are to be recommended. Finally, the book recommender is put to test, first testing it with well known books (e.g., Shakespeare's
'Macbeth'), then testing it using different book titles sampled at random from the database, and then lastly using user input, in which the
user can pass any book they are looking for similar recommendations for and the recommendation function takes care of the rest. You can
try the recommender yourself.
Overall, the project is broken down into 7 sections:
1) Reading and Inspecting the Data
2) Cleaning and Updating the Data
3) Exploratory Data Analysis
4) Feature Engineering: Combining features
5) Text Vectorization and Processing
6) Building a Book Recommendation Function
7) Testing the Recommendation System
In [ ]: #If you're using the executable notebook version, please run this cell first
# to install the necessary Python libraries for the task
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install seaborn
!pip install scipy
!pip install scikit-learn
In [ ]: #Importing the modules for use
import re
import math
import requests
import textwrap
import numpy as np
import pandas as pd
from PIL import Image
from io import BytesIO
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.sparse import csr_matrix
from scipy.spatial.distance import squareform, pdist, jaccard
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.simplefilter("ignore")
sns.set_context('paper')
%matplotlib inline
Defining Custom Functions
In [ ]: #Define function to display books by their covers
def get_covers(books_df: pd.DataFrame):
n_books = len(books_df.index)
n_cols = ((n_books + 1) // 2) if n_books > 5 else n_books
n_rows = math.ceil(n_books / n_cols)
#create figure and specify subplot characeristics
plt.figure(figsize=(4.2*n_cols, 6.4*n_rows), facecolor='whitesmoke')
plt.subplots_adjust(bottom=.1, top=.9, left=.02, right=.88, hspace=.32)
plt.rcParams.update({'font.family': 'Palatino Linotype'})
#adjust font type
#request, access and plot each book cover
for i in range(n_books):
try:
response = requests.get(books_df['cover_image_uri'].iloc[i])
except:
print('\nCouldn\'t retrieve book cover. Check your internet connection and try again...\n\n', flush=True)
return
#access and resize image
img = Image.open(BytesIO(response.content))
img = img.resize((600, 900))
#shorten and wrap book title
full_title = books_df['book_title'].iloc[i]
short_title = re.sub(r'[:?!].*', '', full_title)
title_wrapped = "\n".join(textwrap.wrap(short_title, width=26))
#plot book cover
plt.subplot(n_rows, n_cols, i+1)
plt.imshow(img)
plt.title(title_wrapped, fontsize=21, pad=15)
plt.axis('off')
plt.show()
Part One: Reading and Inspecting the Data
Loading and reading the dataset
In [ ]: #Access and read data into dataframe
df = pd.read_csv('Book_Details.csv', index_col='Unnamed: 0')
#drop unnecessary columns
df = df.drop(['book_id', 'format', 'authorlink', 'num_pages'], axis=1)
Inspecting the data
In [ ]: #report the shape of the dataframe
shape = df.shape
print('Number of coloumns:', shape[1])
print('Number of rows:', shape[0])
Number of coloumns: 10
Number of rows: 16225
In [ ]: #Preview first 5 entries
df.head()
cover_image_uri
book_title
book_details
publication_info
author
genres
num_ratings
num_reviews
average_rating
rating_distribu
0
https://images-na.sslimagesamazon.com/images...
Harry
Potter and
the HalfBlood
Prince
It is the
middle of
the summer,
but there is
a...
['First published
July 16, 2005']
J.K.
Rowling
['Fantasy',
'Young
Adult',
'Fiction',
'Magic',...
-
58398
4.58
{'5': '2,244,154
'775,028
'21
1
https://images-na.sslimagesamazon.com/images...
Harry
Potter and
the Order
of the
Phoenix
Harry Potter
is about to
start his fifth
year ...
['First published
June 21, 2003']
J.K.
Rowling
['Young
Adult',
'Fiction',
'Magic',
'Childrens...
-
64300
4.50
{'5': '2,178,760
'856,178
'29
2
https://images-na.sslimagesamazon.com/images...
Harry
Potter and
the
Sorcerer's
Stone
Harry Potter
has no idea
how famous
he is. Tha...
['First published
June 26, 1997']
J.K.
Rowling
['Fantasy',
'Fiction',
'Young
Adult',
'Magic',...
-
163493
4.47
{'5': '6,544,542
'2,348,390
'8
3
https://images-na.sslimagesamazon.com/images...
Harry
Potter and
the
Prisoner
of
Azkaban
Harry Potter,
along with
his best
friends, Ron...
['First published
J.K.
July 8, 1999'] Rowling
['Fantasy',
'Fiction',
'Young
Adult',
'Magic',...
-
84959
4.58
{'5': '2,892,322
'970,190
'28
4
https://images-na.sslimagesamazon.com/images...
Harry
Potter and
the Goblet
of Fire
It is the
summer
holidays and
soon Harry
Potte...
['First published
J.K.
July 8, 2000'] Rowling
['Fantasy',
'Young
Adult',
'Fiction',
'Magic',...
-
69961
4.57
{'5': '2,500,070
'899,496
'25
Out[ ]:
Checking number of entries and data type per column
In [ ]: #Inspect coloumn headers, data type, and number of entries
df.info()
Index: 16225 entries, 0 to 16224
Data columns (total 10 columns):
#
Column
Non-Null Count
--- ------------------0
cover_image_uri
16225 non-null
1
book_title
16225 non-null
2
book_details
16177 non-null
3
publication_info
16225 non-null
4
author
16225 non-null
5
genres
16225 non-null
6
num_ratings
16225 non-null
7
num_reviews
16225 non-null
8
average_rating
16225 non-null
9
rating_distribution 16225 non-null
dtypes: float64(1), int64(2), object(7)
memory usage: 1.4+ MB
Dtype
----object
object
object
object
object
object
int64
int64
float64
object
Descriptive Statistics
In [ ]: #get overall description of object columns
display(df.describe(include='object').T)
print('\n'+ 80*'_' +'\n')
#get statistical summary of the numerical data
display(df.describe().drop(['25%', '50%', '75%']).apply(lambda x: round(x)))
count
unique
top
freq
cover_image_uri
16225
16120
https://dryofg8nmyqjw.cloudfront.net/images/no...
38
book_title
16225
15491
The Cheat Code
7
book_details
16177
16018
Libro usado en buenas condiciones, por su anti...
6
publication_info
16225
5369
['First published January 1, 2008']
360
author
16225
7615
Stephen King
79
genres
16225
13773
[]
325
rating_distribution
16225
16093
{'5': '0', '4': '0', '3': '0', '2': '0', '1': ...
12
________________________________________________________________________________
num_ratings
num_reviews
average_rating
count
16225.0
16225.0
16225.0
mean
85785.0
5156.0
4.0
std
-
15776.0
0.0
min
0.0
0.0
0.0
max
-
-
5.0
Notably here, based on the above descriptions, we can see that we have multiple books duplicated since the total count of book titles doesn't match
the total number of unique book titles in the dataset. Second, it seems that some books in the data have no descriptions or details about them since
the total number of entries in the 'book_details' column is lower than all the rest. Finally, can see that many books in the dataset have no
specified genre, particularly as 325 of the books featured have an empty list for the genre list column.
As such, consistent with these findings, I will now perform data cleaning and updating in order to deal with each of these issues raised. First, I will drop
the books duplicated in the dataset, deal with books lacking details or descriptions about them and then deal with the issue of genre, either updating
some of the books by assigning the genre labels common to a particular author, provided that that said author is featured more than twice in the
dataset, and, if not, then by removing the books that we couldn't find appropriate genre labels for. This is because genre is a critical factor for deciding
on book similarity and recommendation, as the book recommender system to be built will leverage genre similarity not just book content. Finally, I will
add a new column for year of publication, which extracts the publication year from the 'publication_info' column before dropping it as it
wouldn't be too important or informative thereafter.
Part Two: Cleaning and Updating the Data
In this section, I will engage in data cleaning and updating based on the observations and insights reported above in order to prepare the data and
render it usable for further analysis and model development.
Removing duplicate books
In [ ]: #first, normalize book titles by removing punctuation
df['normalized_title'] = df['book_title'].apply(lambda title: re.sub(r'[^\w\s]', '', title))
#drop duplicate book titles and reset dataframe index
df = df.drop_duplicates(subset='normalized_title', ignore_index=True)
Dealing with missing or inappropriate book details
In [ ]: #check the number of books with inappropiate book description or NaN (not a number) values
print('Number of entries with NaN values in the book details column (before): ', df['book_details'].isna().sum())
#fill NaN book details with empty strings
df['book_details'] = df['book_details'].fillna('')
#check the number of entries after
print('\nNumber of entries with NaN values in the book details column (after): ', df['book_details'].isna().sum())
Number of entries with NaN values in the book details column (before):
Number of entries with NaN values in the book details column (after):
48
0
Cleaning and updating the genres column
After turning the genres into a normal string, I will check the number of empty string and then assign the closest genre labels by author; otherwise, if
no genre labels were found, I will delete these books with no genre.
In [ ]: #Changing string list to list then to string with the genres of books
df['genres'] = df['genres'].apply(lambda x: ', '.join(eval(x)))
In [ ]: #Updating rows with no genre
#get indices of books with no genre labels
no_genre_before = df[df['genres'].str.len() == 0].index
#we can preview the books identified
df.iloc[no_genre_before, 1:8].head(3)
book_title
book_details
publication_info
author
570
Angels & Guides Healing
Meditations
You’ll find a new level of
comfort, safety, an...
['First published
September 1, 2006']
2749
La Santa Muerte
Narcotraficantes, políticos,
delincuentes, emp...
4399
Rush Hudson Limbaugh and His
Times: Reflection...
This series of interviews with
Rush H. Limbaug...
Out[ ]:
genres
num_ratings
num_reviews
Sylvia
Browne
53
1
['First published January
31, 2004']
Homero
Aridjis
29
5
['First published November
1, 2003']
Rush
Limbaugh
6
0
In [ ]: #Get total number of books with no genre before the update
print('Total number of entries with missing genre (before): ', len(df.iloc[no_genre_before]))
#change empty strings with genres common to given author
for i in no_genre_before:
genre_labels = df[df['author']==df['author'].iloc[i]]['genres'].iloc[0]
if len(genre_labels) > 0:
df.at[i, 'genres'] = genre_labels
else:
df.drop(index=i, inplace=True)
#resetting dataframe index
df.reset_index(drop=True, inplace=True)
#check number of books with no genre after the update
no_genre_after = df[df['genres'].str.len() == 0].index
print('\nTotal number of entries with missing genre (after): ', len(df.iloc[no_genre_after]))
Total number of entries with missing genre (before):
Total number of entries with missing genre (after):
319
0
Now finally, in dealing with genre, I will try to make sure that some genres do not conflict with one another. Particularly, I'm going to make sure that if
one book is has Fiction as one of its genre labels it does not simultaneously be classified as 'Nonfiction' as well, as this would mix up some of the
recommendations. First, let's preview some of the books that suffer from this issue.
Dealing with conflicting book genres
In [ ]: #create empty list for storing indices of books with conflicting genres and set count to zero
indices=[]
count=0
#loop over and return all books with conflicting genres
for genre_string, title in zip(df['genres'], df['book_title']):
if 'Fiction' in genre_string and 'Nonfiction' in genre_string:
count += 1
indices.append(df[df['book_title']==title].index)
print(f'{count}. {title} // {genre_string}')
1. If I Die in a Combat Zone, Box Me Up and Ship Me Home // Nonfiction, War, History, Memoir, Military Fiction, Biography, Biography M
emoir
2. Dispatches // Nonfiction, History, War, Memoir, Journalism, Military Fiction, Military History
3. The Last Stand of the Tin Can Sailors: The Extraordinary World War II Story of the U.S. Navy's Finest Hour // History, Nonfiction,
Military Fiction, World War II, War, Military History, Naval History
4. Jesus Freaks: Stories of Those Who Stood for Jesus, the Ultimate Jesus Freaks // Christian, Nonfiction, Biography, Christianity, Re
ligion, Faith, Christian Non Fiction
5. Flags of Our Fathers // History, Nonfiction, Military Fiction, War, World War II, Biography, Military History
6. The March of Folly // History, Nonfiction, Politics, War, World History, Military History, Military Fiction
7. The Art of War // Nonfiction, Philosophy, History, War, Business, Classics, Military Fiction
8. In Pharaoh's Army: Memories of the Lost War // Memoir, Nonfiction, War, History, Biography, Military Fiction, Biography Memoir
9. Imperial Life in the Emerald City: Inside Iraq's Green Zone // Nonfiction, History, Politics, War, Military Fiction, Journalism, Mi
litary History
10. State of Denial // Politics, History, Nonfiction, War, American History, Presidents, Military Fiction
11. Charlie Wilson's War: The Extraordinary Story of How the Wildest Man in Congress and a Rogue CIA Agent Changed the History of our
Times // History, Nonfiction, Politics, War, Biography, Military Fiction, American History
12. Band of Brothers: E Company, 506th Regiment, 101st Airborne from Normandy to Hitler's Eagle's Nest // History, Nonfiction, War, Mi
litary Fiction, World War II, Military History, Historical
13. In Harm's Way: The Sinking of the USS Indianapolis and the Extraordinary Story of Its Survivors // History, Nonfiction, Military F
iction, World War II, War, Survival, Military History
14. We Were Soldiers Once... and Young: Ia Drang - The Battle that Changed the War in Vietnam // History, Nonfiction, Military Fictio
n, War, Military History, American History, Biography
15. The Fall of Berlin 1945 // History, Nonfiction, World War II, War, Military History, Germany, Military Fiction
16. The Civil War, Vol. 1: Fort Sumter to Perryville // History, Civil War, Nonfiction, American History, American Civil War, War, Mil
itary Fiction
17. The Mask of Command // History, Military History, Military Fiction, Nonfiction, Leadership, War, Biography
18. Black Hawk Down: A Story of Modern War // History, Nonfiction, Military Fiction, War, Military History, Africa, Historical
19. Ghost Wars: The Secret History of the CIA, Afghanistan, and Bin Laden from the Soviet Invasion to September 10, 2001 // History, N
onfiction, Politics, War, Military Fiction, Terrorism, Espionage
20. Jarhead : A Marine's Chronicle of the Gulf War and Other Battles // Nonfiction, War, Military Fiction, Memoir, History, Biography,
Military History
21. Fiasco: The American Military Adventure in Iraq // History, Nonfiction, Politics, War, Military Fiction, Military History, America
n History
22. Ghost Soldiers: The Epic Account of World War II's Greatest Rescue Mission // History, Nonfiction, World War II, War, Military Fic
tion, Military History, American History
23. Vietnam: A History // History, Nonfiction, War, Military Fiction, Military History, American History, Politics
24. A World Undone: The Story of the Great War, 1914 to 1918 // History, Nonfiction, World War I, War, Military History, Military Fict
ion, Audiobook
25. The First Day on the Somme // History, World War I, Nonfiction, War, Military History, Military Fiction, 20th Century
26. The Forgotten Soldier // History, Nonfiction, War, Military Fiction, World War II, Biography, Military History
27. This Kind of War: A Study in Unpreparedness // History, Military Fiction, Nonfiction, War, Military History, American History, Asi
a
28. Henry James: A Life in Letters // Biography, Nonfiction, Classics, Literary Fiction, American
29. Company Commander: The Classic Infantry Memoir of World War II // History, Military Fiction, Military History, Nonfiction, World W
ar II, War, Biography
30. Flyboys: A True Story of Courage // History, Nonfiction, World War II, War, Military Fiction, Military History, Biography
31. Hitler's War // History, World War II, Nonfiction, War, Biography, Politics, Military Fiction
32. Leadership Secrets of Attila the Hun // Leadership, Business, Nonfiction, History, Management, Self Help, Military Fiction
33. The New Dare to Discipline // Parenting, Nonfiction, Christian, Family, Self Help, Psychology, Christian Non Fiction
34. Life Application Study Bible: NIV // Christian, Religion, Nonfiction, Christianity, Reference, Spirituality, Christian Non Fiction
35. The Face of Battle: A Study of Agincourt, Waterloo and the Somme // History, Nonfiction, Military History, Military Fiction, War,
European History, World War I
36. To Hell and Back // History, Nonfiction, Biography, Military Fiction, War, World War II, Military History
37. Strategy // History, Nonfiction, Military Fiction, War, Military History, Business, Politics
38. The Troubles: Ireland's Ordeal- and the Search for Peace // History, Ireland, Nonfiction, Politics, Irish Literature, Mil
itary Fiction, European History
39. Against All Enemies: Inside America's War on Terror // Politics, Nonfiction, History, War, Terrorism, Military Fiction, American H
istory
40. The Best and the Brightest // History, Nonfiction, Politics, War, American History, International Relations, Military Fiction
41. A Bright Shining Lie: John Paul Vann and America in Vietnam // History, Nonfiction, War, Biography, American History, Military Fic
tion, Military History
42. Killing Pablo: The Hunt for the World's Greatest Outlaw // Nonfiction, History, True Crime, Crime, Biography, Military Fiction, Po
litics
43. Dereliction of Duty: Lyndon Johnson, Robert McNamara, the Joint Chiefs of Staff, and the Lies That Led to Vietnam // History, Poli
tics, Nonfiction, Military Fiction, War, Military History, American History
44. Enemy at the Gates: The Battle for Stalingrad // History, Nonfiction, War, World War II, Military History, Military Fiction, Russi
a
45. The Coldest Winter: America and the Korean War // History, Nonfiction, War, Military History, Military Fiction, American History,
Politics
46. The War: An Intimate History,- // History, Nonfiction, World War II, War, Military Fiction, American History, Military Hi
story
47. An Army at Dawn: The War in North Africa,- // History, Nonfiction, World War II, Military History, War, Military Fiction,
Africa
48. Quartered Safe Out Here: A Harrowing Tale of World War II // History, Nonfiction, War, Memoir, World War II, Military History, Mil
itary Fiction
49. Stalingrad: The Fateful Siege,- // History, Nonfiction, War, World War II, Russia, Military History, Military Fiction
50. Mind Siege: The Battle for the Truth // Christian, Religion, Nonfiction, Christianity, Faith, Christian Non Fiction, Spirituality
51. Lectures on Faith // Religion, Lds, Nonfiction, Church, Spirituality, Lds Non Fiction, Theology
52. The Price of Admiralty: The Evolution of Naval Warfare from Trafalgar to Midway // History, Military History, Military Fiction, No
nfiction, War, Naval History, European History
53. Lone Survivor: The Eyewitness Account of Operation Redwing and the Lost Heroes of SEAL Team 10 // Nonfiction, Military Fiction, Hi
story, War, Biography, Memoir, Military History
54. With the Old Breed: At Peleliu and Okinawa // History, Nonfiction, War, Military Fiction, World War II, Biography, Memoir
55. The Puzzle Palace: Inside the National Security Agency, America's Most Secret Intelligence Organization // History, Nonfiction, Es
pionage, Politics, Military Fiction, Technology, Government
56. The Late Great Planet Earth // Religion, Christian, Nonfiction, Christianity, Theology, Christian Non Fiction, Spirituality
57. Great Escape // History, Nonfiction, War, World War II, Military Fiction, Historical, Military History
58. Platoon Leader: A Memoir of Command in Combat // Military Fiction, History, War, Military History, Leadership, Nonfiction, Biograp
hy
59. The Butterfly Dreams // Memoir, Nonfiction, War, History, Biography, Military Fiction, Biography Memoir
60. Supplying War: Logistics from Wallenstein to Patton // History, Military History, Military Fiction, War, Nonfiction, Economics, Ac
ademic
61. Comrade J: Untold Secrets Of Russia's Master Spy In America After The End Of The Cold War // Nonfiction, History, Espionage, Russi
a, Biography, Military Fiction, True Crime
62. The Monster Loves His Labyrinth // Poetry, Nonfiction, Literature, Literary Fiction, Essays
63. A Question of Honor: The Kosciuszko Squadron: Forgotten Heroes of World War II // History, Nonfiction, War, World War II, Poland,
Aviation, Military Fiction
64. Human rights and legal defense in Northern Ireland: The intimidation of defense lawyers : the murder of Patrick Finucane // Christ
ian, Prayer, Nonfiction, Spirituality, Christian Non Fiction, Faith, Christian Living
65. The Power of Praying Through the Bible // Christian, Prayer, Nonfiction, Spirituality, Christian Non Fiction, Faith, Christian Liv
ing
66. Soldiers Of Reason: The RAND Corporation And The Rise Of The American Empire // History, Nonfiction, Military Fiction, Politics, S
cience, American History, American
67. 1001 Books for Every Mood // Nonfiction, Books About Books, Reference, Writing, Literary Criticism, Literature, Literary Fiction
68. Lydia // History, Nonfiction, Politics, American History, War, Russia, Military Fiction
69. One Minute to Midnight: Kennedy, Khrushchev and Castro on the Brink of Nuclear War // History, Nonfiction, Politics, American Hist
ory, War, Russia, Military Fiction
70. The War Path: Hitler's Germany,- // History, World War II, Nonfiction, Germany, Military Fiction
71. The Angel of Grozny: Orphans of a Forgotten War // Nonfiction, Russia, History, War, Journalism, Military Fiction, Islam
72. The Apostle: A Life of Paul // Biography, Christian, Religion, Nonfiction, History, Christianity, Christian Non Fiction
73. Kill Bin Laden: A Delta Force Commander's Account of the Hunt for the World's Most Wanted Man // Military Fiction, Nonfiction, His
tory, War, Military History, Terrorism, Historical
74. The Bitter Road to Freedom: A New History of the Liberation of Europe // History, Nonfiction, World War II, War, European History,
Military History, Military Fiction
75. The Battle of the Bulge // History, Nonfiction, World War II, War, Military History, Military Fiction, Audiobook
76. The Dark Side: The Inside Story of How the War on Terror Turned Into a War on American Ideals // Nonfiction, Politics, History, Wa
r, Terrorism, American History, Military Fiction
77. Camille Saint-Saëns: On Music and Musicians // History, Africa, Military Fiction, South Africa, War, Nonfiction, Military History
78. Commando: A Boer Journal Of The Boer War // History, Africa, Military Fiction, South Africa, War, Nonfiction, Military History
79. Sledge Patrol: A WWII Epic Of Escape, Survival, And Victory // History, Nonfiction, World War II, Survival, Adventure, War, Milita
ry Fiction
80. Radical Womanhood: Feminine Faith in a Feminist World // Christian, Nonfiction, Christianity, Christian Living, Faith, Christian N
on Fiction, Theology
81. The Good Soldiers // Nonfiction, War, History, Military Fiction, Military History, Politics, Journalism
82. The Long Gray Line: The American Journey of West Point's Class of 1966 // History, Nonfiction, Military Fiction, Military History,
American History, Biography, War
83. Lost in Shangri-la: A True Story of Survival, Adventure, and the Most Incredible Rescue Mission of World War II // Nonfiction, His
tory, World War II, War, Adventure, Survival, Military Fiction
84. Give Me Tomorrow: The Korean War's Greatest Untold Story // History, Nonfiction, Military Fiction, Military History, War, Biograph
y, Audiobook
85. Red Eagles: Americas Secret MiGs // Aviation, History, Military Fiction, Nonfiction, Military History, Aircraft, War
86. What It is Like to Go to War // Nonfiction, History, War, Military Fiction, Memoir, Biography, Psychology
87. American Sniper: The Autobiography of the Most Lethal Sniper in U.S. Military History // Nonfiction, Biography, Military Fiction,
History, War, Memoir, Autobiography
88. Shot Down: The True Story of Pilot Howard Snyder and the Crew of the B-17 Susan Ruth // History, Nonfiction, Military Fiction, Adu
lt, Biography, Aviation, Adventure
89. Extreme Ownership: How U.S. Navy SEALs Lead and Win // Leadership, Business, Nonfiction, Self Help, Personal Development, Manageme
nt, Military Fiction
90. Defeating Jihad: The Winnable War // Politics, Nonfiction, History, Military Fiction, Terrorism, Military History, War
91. Real Friends // Graphic Novels, Middle Grade, Memoir, Comics, Childrens, Realistic Fiction, Nonfiction
92. Grunt: The Curious Science of Humans at War // Nonfiction, Science, War, History, Military Fiction, Humor, Audiobook
93. Is Goat Beef? // Nonfiction, Humor, War, True Story, Military Fiction, History, Adult
94. Huế 1968: A Turning Point of the American War in Vietnam // History, Nonfiction, War, Military History, Military Fiction, American
History, Asia
95. Vietnam: An Epic Tragedy,- // History, Nonfiction, War, Military History, Military Fiction, American History, Politics
96. The Guns of August // History, Nonfiction, War, World War I, Military Fiction, Military History, Politics
97. Whispers In The Tall Grass // History, Military Fiction, Nonfiction, War, Biography, Military History, Memoir
98. Operation Pedestal: The Fleet That Battled to Malta, 1942 // History, Nonfiction, World War II, Military History, War, Military Fi
ction, Historical
99. The Bomber Mafia: A Dream, a Temptation, and the Longest Night of the Second World War // History, Nonfiction, Audiobook, War, Wor
ld War II, Military Fiction, Historical
100. The Mosquito Bowl: A Game of Life and Death in World War II // Nonfiction, History, Sports, World War II, Military Fiction, War,
Football
101. Prisoners of the Castle: An Epic Story of Survival and Escape from Colditz, the Nazis' Fortress Prison // History, Nonfiction, Wo
rld War II, War, Historical, Biography, Military Fiction
102. Diplomats & Admirals: From Failed Negotiations and Tragic Misjudgments to Powerful Leaders and Heroic Deeds, the Untold Story of
the Pacific War from Pearl Harbor to Midway // History, Nonfiction, War, Military Fiction, World War II, Japan, Politics
As demonstrated, most of the books featured here tend to be books about historical wars, persumably with an element of fiction, hence they tend to
be classified as 'Nonfiction' and simultaneously as 'Military Fiction'. We also have a few books classified as both 'Nonfiction' and 'Literary Fiction'.
Similarly, there's at least one book classified as both 'Nonfiction' and 'Realistic Fiction'. These seem to be literary works with a mixture of both indeed.
And finally, we have a few other books classified as 'Nonfiction' and 'Christian Non Fiction'. Now, in order to deal with this, I will simply replace
'Military Fiction' with 'Military' and 'Literary Fiction' with 'Literary'. Finally, for the purposes of accurate text processing, I will change the genre label
'Christian Non Fiction' to simply 'Christian Nonfiction', joining the last two words together.
In [ ]: #create dictionary with sub-strings to be replaced or removed
replacements_dict = { 'Military Fiction': 'Military',
'Literary Fiction': 'Literary',
'Realistic Fiction': 'Realistic',
'Non Fiction': 'Nonfiction' }
#replace substrings according to specified values
df['genres'] = df['genres'].replace(replacements_dict, regex=True)
#Now we can check again
count=0
for genre_string, title in zip(df['genres'], df['book_title']):
if 'Fiction' in genre_string and 'Nonfiction' in genre_string:
count += 1
print(f'Number of books with conflicting genres: {count}')
Number of books with conflicting genres:
0
Creating a column with publication year
In [ ]: #Changing string list in publication info column to normal string
df['publication_info'] = df['publication_info'].apply(lambda x: eval(x)[0] if len(eval(x)) > 0 else 'n.d.')
#extract year of publication from publication info column and assign it to a new data column, 'publication_year' (if 'n.d.' assign an
df['publication_year'] = df['publication_info'].str.extract(r'(\d{1,4}$)').fillna('')
#preview changes and new publication year column
df[['publication_info', 'publication_year']].sample(5)
publication_info
publication_year
547
First published December 1, 1980
1980
7195
First published January 1, 1937
1937
8293
First published January 1, 1798
1798
1428
First published June 7, 1926
1926
6407
First published January 28, 2003
2003
Out[ ]:
Part Three: Exploratory Data Analysis
In this section, I will explore the dataset in more detail, performing some further data analysis and visualization to get familiar with the data and
delineate some of the underlying relationships. I will examine the most common book genres in the data, the most top rated books, the rating
distribution and the relationship between user ratings and user reviews.
Top 20 book genres featured in the data
In [ ]: #Create one-hot encoded dataframe with all unique genres in the data
genres_df = df['genres'].str.get_dummies(', ').astype(int)
#preview genres dataframe
genres_df.head()
Out[ ]:
12th
Century
13th
Century
15th
Century
16th
Century
17th
Century
18th
Century
19th
Century
1st
Grade
20th
Century
21st
Century
...
X
Men
Yaoi
Young
Adult
Young Adult
Contemporary
Young
Adult
Fantasy
0
0
0
0
0
0
0
0
0
0
0
...
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
...
0
0
1
0
0
2
0
0
0
0
0
0
0
0
0
0
...
0
0
1
0
0
3
0
0
0
0
0
0
0
0
0
0
...
0
0
1
0
0
4
0
0
0
0
0
0
0
0
0
0
...
0
0
1
0
0
5 rows × 727 columns
We can see here we have a total of 727 unique genre classifications! Now, I will identify and present the top 20 most features book genres.
H
In [ ]: #Extract top 20 genres by genre frequency
top20_genres = genres_df.sum().sort_values(ascending=False)[:20]
#Visualize top 20 genres using bar chart
top20_genres.plot(kind='bar', color='#24799e', width=.8,
linewidth=.8, edgecolor='k', rot=90)
Out[ ]:
Top 10 books on Goodreads
In [ ]: #Assign appropriate data type to the rating distribution column
df['rating_distribution'] = df['rating_distribution'].apply(lambda x: eval(x))
#get total number of five star ratings per book from the rating distribution column
df['total_5star_ratings'] = [int(dic['5'].replace(',','')) for dic in df['rating_distribution']]
#sort data by books with highest frequency of 5 star ratings
top10_books = df.sort_values(by='total_5star_ratings', ascending=False).iloc[:10][['book_title', 'author', 'genres', 'cover_image_uri
#report the results table
top10_books.iloc[:,:3]
book_title
author
genres
Harry Potter and the Sorcerer's Stone
J.K. Rowling
Fantasy, Fiction, Young Adult, Magic, Children...
1
The Hunger Games
Suzanne Collins
Young Adult, Fiction, Fantasy, Science Fiction...
2
To Kill a Mockingbird
Harper Lee
Classics, Fiction, Historical Fiction, School,...
3
Harry Potter and the Prisoner of Azkaban
J.K. Rowling
Fantasy, Fiction, Young Adult, Magic, Children...
4
Harry Potter and the Deathly Hallows
J.K. Rowling
Fantasy, Young Adult, Fiction, Magic, Children...
5
Harry Potter and the Goblet of Fire
J.K. Rowling
Fantasy, Young Adult, Fiction, Magic, Children...
6
The Fault in Our Stars
John Green
Young Adult, Fiction, Contemporary, Realistic,...
7
Twilight
Stephenie Meyer
Fantasy, Young Adult, Romance, Fiction, Vampir...
8
Pride and Prejudice
Jane Austen
Fiction, Historical Fiction, Historical, Liter...
9
Harry Potter and the Chamber of Secrets
J.K. Rowling
Fantasy, Fiction, Young Adult, Magic, Children...
Out[ ]:
0
In [ ]: #get and display books by cover
get_covers(top10_books)
Distribution of rating scores
In [ ]: #Aggregate ratings by rating star
rating_counts = {'5':0, '4':0, '3':0, '2':0, '1':0}
for ratings in df['rating_distribution']:
for key, value in ratings.items():
rating_counts[key] += int(value.replace(',',''))
#plot the ratings frequency distribution
plt.figure(figsize=(7,5))
plt.bar(rating_counts.keys(), rating_counts.values(), color='#24799e', width=.7, linewidth=.8, edgecolor='k')
plt.title('Frequency Distribution of Star Ratings', fontsize=11)
plt.xlabel('Star Rating', fontsize=10)
plt.ylabel('Frequency of Rating', fontsize=10)
plt.grid(axis='y', linestyle='-', alpha=.7)
plt.show()
Relationship between number of ratings and average rating score
In [ ]: #Visualize the relationship between the number of ratings and the average rating
# score for a given book using scatter plot
plt.figure(figsize=(9,5))
sns.scatterplot(data=df, x='num_ratings', y='average_rating')
plt.gcf().axes[0].xaxis.get_major_formatter().set_scientific(False)
plt.xticks(rotation=-30)
plt.title('Relationship between Number of Ratings and Average Rating', fontsize=13)
plt.xlabel('Number of Ratings', fontsize=11.5)
plt.ylabel('Average Book Rating', fontsize=11.5)
plt.show()
As depicted by the above plot, there is a positive relationship between the number of ratings and the average rating score of a given book. Users
generally tend to give more ratings if find the book favorable and deserving of a high rating score. Now that gathered an overview of the data, I will
next move to performing an important feature engineering step to prepare the data for modeling and text processing, particularly, I will create a new
column, 'combined_features' that combines all the important book features together, which would be crucial for subsequent analysis applying
text vectorization and processing.
Part Four: Feature Engineering—Combining Features
In [ ]: #Combine features for ovarall text processing
df['combined_features'] = (df['book_title'] + ' / ' + df['author'] + ' / ' + df['publication_year'] + ' / ' + df['genres'] + ' / ' +
#Preview a sample of the combined features column
for row in df['combined_features'].sample(5):
print(row[:200],'\n')
Inkspell / Cornelia Funke / 2005 / Fantasy, Young Adult, Fiction, Middle Grade, Childrens, Adventure, Magic / 3.94 / The captivating s
equel to INKHEART, the critically acclaimed, international bestsel
The Stone Raft / José Saramago / 1986 / Fiction, Portugal, Magical Realism, Literature, Portuguese Literature, Nobel Prize, Novels /
3.82 / When the Iberian Peninsula breaks free of Europe and begins
The Gods of Mars / Edgar Rice Burroughs / 1913 / Science Fiction, Fantasy, Fiction, Classics, Adventure, Pulp, Science Fiction Fantasy
/ 3.88 / After the long exile on Earth, John Carter finally retur
The Second Korean War / Ted Halstead / 2018 / Fiction, War, Military / 4.15 / "This book was like Tom Clancy reincarnated. Ted Halstea
d really knows how to write a thriller. Can't wait for more!"Two R
All Dreamers Go to America / Ana Ingham / 2009 / Drama, Novels, Fiction, Contemporary / 4.23 / Ana Ingham's delightful novel, All Drea
mers Go to America, is a wonderful tale about a young man who has
Part Five: Text Vectorization and Processing
In this section, I will employ TF-IDF to perform text vectorization, weighting the importances of terms in relation to the description of a single book
and relative to the descriptions of other books. The TF-IDF vectorizer would also be supplied with different checks and measures to process text
better, standardizing terms, filtering out the most common terms and meaningless terms or typos, and so on. The vectorizer would then return a data
matrix with all the terms and term weights per book which would then be utilized for further analysis. Particularly, the next step would be to measure
and quanity the similarities between books based off the obtained data matrix. As such, to identify the similarities between the books, I will proceed
by using the cosine distance similarity metric to measure term similarities across the different books in the data. A cosine similarity of 1 should
indicate identity, or, in this context, maximum overall similarity, whilst cosine similarity of 0 should indicate zero commonality. The resulting similarity
matrix would represent the overall similarities between the different books in the dataset. This would cover all the important features of a book,
including the author, book genres, and plot summary or description, as depicted by the 'combined_features' column defined above. Finally, a
separate similarity matrix will be created for genre alone. To identify genre similarities, I will employ jaccard distance which would quantify the genre
commonalities/uncommonalities across the books in the dataset. This would enable us to improve and better tailor book recommendations by
leveraging genre similarity along with overall similarity.
In [ ]: #Define custom tokenizer to process text better
def my_tokenizer(text):
#Remove punctuation and standardize text (all in lowercase, no whitespace)
tokens = re.findall(r'\b\w+\b', text.lower().strip())
return tokens
Identifying overall similarity: Text vectorization with TF-IDF
In [ ]: #Create TF-IDF object and set text vectorization characteristics
tfidf_vectorizer = TfidfVectorizer(stop_words='english',
#remove common english words (e.g., the, then)
tokenizer=my_tokenizer,
#specify text tokenizer (to process and standardize terms)
ngram_range=(1,2),
#specify n-gram range
min_df=2)
#specify min_df to filter out uncommon terms
#fit and transform the data to get a TF-IDF matrix
tfidf_mtrx = tfidf_vectorizer.fit_transform(df['combined_features'])
#Now computing cosine distance similarity
#calculate cosine distance similarity to obtain similarity matrix
similarity_mtrx = cosine_similarity(tfidf_mtrx, tfidf_mtrx)
Identifying genre similarity
Now I will create a similarity matrix for genre alone (using jaccard distance similarity). First, I will turn my genres dataframe into a sparse matrix for
faster processing and then compute the jaccard distance similarity to obtain a similarity matrix for genre alone.
In [ ]: #Convert genres_df to CSR matrix
genres_csr_mtrx = csr_matrix(genres_df.values).astype(bool).toarray()
#Compute jaccard distance similarity and return jaccard similarity matrix
genre_sim_mtrx = 1 - squareform(pdist(genres_csr_mtrx, metric=jaccard))
#normalize jaccard distance scores
genre_sim_mtrx = genre_sim_mtrx / np.max(genre_sim_mtrx) if np.max(genre_sim_mtrx) > 0 else genre_sim_mtrx
Now with all the data processed and analyzed throughly, I will build the main function for tailoring and delivering book recommendations.
Part Six: Building a Book Recommendation Function
In this section, I will develop a custom function for delivering personalized book recommendations. This function will constitute the heart of the book
recommendation system. It will take a book title as input and return the most relevant book recommendations based off that book, utilizing and
balancing the similarity matrices obtained, leveraging overall similarity as well as genre similarity. It will also be supplied with a special parameter,
alpha , which specifies the exact balance between the two matrices, i.e., whether the recommendations should be tailored by genre similarity alone
or overall similarity, or a mixture of both, and, if so, to which extent. It's will also feature another parameter, N , which specifies the exact number of
book recommendations to return. The output would be a data table rendering the recommendation results as well as displaying each book by its
cover in a sequential order. You can read the function's documentation for more details.
In [ ]: #Define helper functions to return book recommendations
def Get_Recommendations(title: str, sim_mtrx: np.ndarray, genre_sim_mtrx: np.ndarray, alpha=0.5, N=10):
"""
This function takes a book title and recommends similar books that cover similar themes
or fall within the same genre categories.
Parameters:
- title (str): The title of the book for which recommendations are sought.
- sim_mtrx (ndarray): A similarity matrix based on book overall similarities, where each
row corresponds to a book and each column corresponds to its cosine similarity score
with other books.
- genre_sim_mtrx (ndarray): A similarity matrix based on book genres, where each row
corresponds to a book and each column corresponds to its jaccard similarity score with
other books based on genre.
- alpha (float, optional): Weighting factor for combining overall similarity and genre
similarity. Defaults to 0.5, balancing overall similarity and genre similarity together.
- N (int, optional): Number of recommendations to return. Defaults to 10.
Returns:
- Data table (Series) with recommended books and plot of each book with its cover.
Raises:
- TypeError: If the title provided is not a string.
Notes:
- This function filters, preprocesses and standardizes the book titles given, identifies its genre
categories, importantly, identifying whether it's Fiction or Nonfiction work to prevent genre
overall while looking for recommendations.
- It looks for book recommendations by combining similarity scores from two matrices: sim_mtrx
(based on overall similarities) and genre_sim_mtrx (based on genres).
- It prioritizes books with similar genre categories; otherwise, it recommends book based on
overall book similarity.
- Finally, recommendations are filtered to include books by a different variety of authors, limiting
the number of recommendations to only 5 books per one author.
- The number of book recommendations can be adjusted using the 'N' parameter. Default is 10 book recommendations.
"""
#check if title provided is of the correct data type (string)
try:
curr_title = str(title)
except:
raise TypeError('Book title entered is not string.')
#standardize titles for accurate comparisons
title = curr_title.lower().strip()
full_titles = df['book_title'].apply(lambda title: title.lower().strip())
partial_titles = full_titles.str.extract(r'^(.*?):')[0].dropna()
#check if provided title matches book title in the dataset and get index if found
if title in full_titles.values:
indx = df[full_titles == title].index[0]
elif title in set(partial_titles.values):
indx_partial = partial_titles[partial_titles == title].index[0]
indx = df[df['book_title'] == df['book_title'].iloc[indx_partial]].index[0]
else:
#try normalizing book titles across the board by removing punctuations and removing 'the' if the book starts with it for bett
normalized_title = re.sub(r'(^\s*(the|a)\s+|[^\w\s])', '', title, flags=re.IGNORECASE)
normalized_full_titles = full_titles.apply(lambda title: re.sub(r'(^\s*(the|a)\s+|[^\w\s])', '', title, flags=re.IGNORECASE))
normalized_partial_titles = partial_titles.apply(lambda title: re.sub(r'(^\s*(the|a)\s+|[^\w\s])', '', title, flags=re.IGNORE
if normalized_title in normalized_full_titles.values:
indx = df[normalized_full_titles == normalized_title].index[0]
elif normalized_title in set(normalized_partial_titles.values):
indx_partial = normalized_partial_titles[normalized_partial_titles==normalized_title].index[0]
indx = df[df['book_title'] == df['book_title'].iloc[indx_partial]].index[0]
else:
print(f'\nBook with title \'{curr_title}\' is not found. Please try a different book.\n', flush=True)
return False
#Check if 'Fiction' is in the genre of the selected book
is_fiction = 'Fiction' in df['genres'].iloc[indx]
#Find books with the same genre category
if is_fiction:
book_indices_ByGenre = [i for i in df.index if ('Fiction' in df['genres'].iloc[i]) and (i != indx)]
else:
book_indices_ByGenre = [i for i in df.index if ('Fiction' not in df['genres'].iloc[i] or 'Nonfiction' in df['genres'].iloc[i]
#Combine the two similarity matrices using weighted sum
weighed_similarity = (alpha * sim_mtrx[indx]) + ((1 - alpha) * genre_sim_mtrx[indx])
#Get cosine similarity scores for books with the same genre
similarity_scores = [(i, weighed_similarity[i]) for i in book_indices_ByGenre]
#Filter scores to only include books with the same genre category
similarity_scores = [score for score in similarity_scores if score[0] in book_indices_ByGenre]
#Sort the books based on the genre similarity scores
similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
#If less than N books are found in the same genre category, add books by closest overall cosine distance
if len(similarity_scores) < N:
cos_scores = list(enumerate(weighed_similarity[indx]))
cos_scores = sorted(cos_scores, key=lambda x: x[1], reverse=True)
cos_scores = [score for score in cos_scores if score[0] != indx and score[0] not in [x[0] for x in similarity_scores]]
similarity_scores += [score for score in cos_scores if score not in similarity_scores][:N - len(similarity_scores)]
#Excl
#Limit recommendations to 5 books per author
author_counts = {}
similarity_scores_filtered = []
for score in similarity_scores:
author = df['author'].iloc[score[0]]
if author not in author_counts or author_counts[author] < 5:
similarity_scores_filtered.append(score)
author_counts[author] = author_counts.get(author, 0) + 1
#Get the scores of the N most similar books
most_similar_books = similarity_scores_filtered[:N]
#Get the indices of the books selected
most_similar_books_indices = [i[0] for i in most_similar_books]
#Prepare DataFrame with recommended books and their details
recommended_books = df.iloc[most_similar_books_indices][['book_title', 'author', 'cover_image_uri']]
recommended_books['Recommendation'] = recommended_books.apply(lambda row: f"{row['book_title']} (by {row['author']})", axis=1)
recommended_books.reset_index(drop=True, inplace=True)
#Return book recommendations
print(f"\nRecommendations for '{curr_title.title()}' (by {df['author'].iloc[indx]}):", flush=True)
display(recommended_books['Recommendation'].to_frame().rename(lambda x:x+1))
print('\n', flush=True)
get_covers(recommended_books)
return
Part Seven: Testing the Recommendation System
Finally, in this last section I will test out the recommendation system. This will unfold in three steps. First, I will test the recommender by passing a
popular book title to the function, such as Shakespeare's Macbeth, and execute it to obtain the relevant book recommendations. Second, I will obtain
a sample of books picked at random from the dataset and pass them to the function to obtain and evaluate the recommendations for each. And
finally, I will develop a function that takes book titles from the user as input and return the relevant book recommendations when available.
In [ ]: #Adjust pandas display settings to display entire column
pd.set_option('display.max_colwidth', None)
Generating Book Recommendation for Famous Title
In [ ]: #Get 10 book recommendations for 'Macbeth' (by Shakespeare)
book_title = 'Macbeth'
Get_Recommendations(book_title, similarity_mtrx, genre_sim_mtrx, alpha=0.7, N=10)
Recommendations for 'Macbeth' (by William Shakespeare):
Recommendation
1
Hamlet (by William Shakespeare)
2
Othello (by William Shakespeare)
3
King Lear (by William Shakespeare)
4
Romeo and Juliet (by William Shakespeare)
5
The Merchant of Venice (by William Shakespeare)
6
Hamlet: Screenplay, Introduction And Film Diary (by Kenneth Branagh)
7
Doubt, a Parable (by John Patrick Shanley)
8
The Oedipus Cycle: Oedipus Rex, Oedipus at Colonus, Antigone (by Sophocles)
9
Oedipus Rex (by Sophocles)
10
Antigone (by Sophocles)
Generating Book Recommendations from Random Titles
In [ ]: #Get recommendations for titles chosen at random
random_titles = df.sample(5)[['book_title','author']]
#get recommendations for the selected titles
for title,author in zip(random_titles.iloc[:,0],random_titles.iloc[:,1]):
Get_Recommendations(title, similarity_mtrx, genre_sim_mtrx, alpha=0.7, N=10)
print('\n', 150*'_' + '\n')
Recommendations for 'Persuader' (by Lee Child):
Recommendation
1
One Shot (by Lee Child)
2
Nothing to Lose (by Lee Child)
3
Bad Luck and Trouble (by Lee Child)
4
Worth Dying For (by Lee Child)
5
Tripwire (by Lee Child)
6
Dragan Radelscu & The Vampires Of Paris (by Shamus Sherwood)
7
Face of a Killer (by Robin Burcell)
8
The Nowhere Man (by Gregg Andrew Hurwitz)
9
The Lion's Game (by Nelson DeMille)
10
Saving Faith (by David Baldacci)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'Silent Night 2' (by R.L. Stine):
Recommendation
1
Silent Night (by R.L. Stine)
2
All-Night Party (by R.L. Stine)
3
The Face (by R.L. Stine)
4
The Secret Bedroom (by R.L. Stine)
5
Night of the Living Dummy (by R.L. Stine)
6
After the First Death (by Robert Cormier)
7
Greenglass House (by Kate Milford)
8
Frozen Charlotte (by Alex Bell)
9
172 Hours on the Moon (by Johan Harstad)
10
Full Tilt (by Neal Shusterman)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'The Postman' (by David Brin):
Recommendation
1
This Is The Way The World Ends (by James K. Morrow)
2
The Gone-Away World (by Nick Harkaway)
3
Earth Abides (by George R. Stewart)
4
Alas, Babylon (by Pat Frank)
5
Farnham's Freehold (by Robert A. Heinlein)
6
Swan Song (by Robert McCammon)
7
The White Plague (by Frank Herbert)
8
Wool (by Hugh Howey)
9
Go-Go Girls of the Apocalypse (by Victor Gischler)
10
Dies the Fire (by S.M. Stirling)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'The Black Circle' (by Patrick Carman):
Recommendation
1
Beyond the Grave (by Jude Watson)
2
One False Note (by Gordon Korman)
3
The Viper's Nest (by Peter Lerangis)
4
Into the Gauntlet (by Margaret Peterson Haddix)
5
The Sword Thief (by Peter Lerangis)
6
In Too Deep (by Jude Watson)
7
The Emperor's Code (by Gordon Korman)
8
Storm Warning (by Linda Sue Park)
9
The Maze of Bones (by Rick Riordan)
10
The Ersatz Elevator (by Lemony Snicket)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'Chase The Dark' (by Annette Marie):
Recommendation
1
Demons at Deadnight (by A. Kirk)
2
Stone Cold Touch (by Jennifer L. Armentrout)
3
Immortal Beloved (by Cate Tiernan)
4
Dead Man Rising (by Lilith Saintcrow)
5
The Gathering Darkness (by Lisa Collicutt)
6
If I Die (by Rachel Vincent)
7
White Hot Kiss (by Jennifer L. Armentrout)
8
Carrier of the Mark (by Leigh Fallon)
9
Every Last Breath (by Jennifer L. Armentrout)
10
The Indigo Spell (by Richelle Mead)
_____________________________________________________________________________________________________________________________________
_________________
Generating Book Recommendations from User Input
In [ ]: #Defining custom function that requests a book title from the user and returns relevant book recommendations
def Get_Recommendations_fromUser():
while True:
book_title = input('\nEnter book title: ')
recommendations = Get_Recommendations(book_title, similarity_mtrx, genre_sim_mtrx, alpha=0.7, N=10)
print('\n', 150*'_' + '\n', flush=True)
if recommendations is not False:
response = str(input('\n\nWould you like to get recommendations for more books? [Yes/no]\n')).lower().strip()
if response in ['yes', 'y']:
continue
elif response in ['no', 'n']:
print('\nThank you for trying the recommender.\nExiting...')
break
else:
print('\nResponse invalid.\nProcess terminating...')
break
In [ ]: #Execute the user recommender function
Get_Recommendations_fromUser() # The Great Gatsby; Return of the king; Atomic Habit; Atomic Habits; a brief history of time; Critiqu
Recommendations for 'The Great Gatsby' (by F. Scott Fitzgerald):
Recommendation
1
F. Scott Fitzgerald: The Great Gatsby (by Nicolas Tredell)
2
This Side of Paradise (by F. Scott Fitzgerald)
3
The Beautiful and Damned (by F. Scott Fitzgerald)
4
The Love of the Last Tycoon (by F. Scott Fitzgerald)
5
Great Expectations (by Charles Dickens)
6
The Pearl (by John Steinbeck)
7
Ethan Frome (by Edith Wharton)
8
The Portrait of a Lady (by Henry James)
9
Old School (by Tobias Wolff)
10
A Tale of Two Cities (by Charles Dickens)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'Return Of The King' (by J.R.R. Tolkien):
Recommendation
1
The Two Towers (by J.R.R. Tolkien)
2
J.R.R. Tolkien 4-Book Boxed Set: The Hobbit and The Lord of the Rings (by J.R.R. Tolkien)
3
The Fellowship of the Ring (by J.R.R. Tolkien)
4
The Lord of the Rings (by J.R.R. Tolkien)
5
Orcs (by Stan Nicholls)
6
The Children of Húrin (by J.R.R. Tolkien)
7
New Spring (by Robert Jordan)
8
The Great Hunt (by Robert Jordan)
9
The Shadow Rising (by Robert Jordan)
10
The Eye of the World (by Robert Jordan)
_____________________________________________________________________________________________________________________________________
_________________
Book with title 'Atomic Habit' is not found. Please try a different book.
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'Atomic Habits' (by James Clear):
Recommendation
1
The Power of Habit: Why We Do What We Do in Life and Business (by Charles Duhigg)
2
Better Than Before: Mastering the Habits of Our Everyday Lives (by Gretchen Rubin)
3
Deep Work: Rules for Focused Success in a Distracted World (by Cal Newport)
4
Eat That Frog! 21 Great Ways to Stop Procrastinating and Get More Done in Less Time (by Brian Tracy)
5
13 Things Mentally Strong People Don't Do: Take Back Your Power, Embrace Change, Face Your Fears, and Train Your Brain for Happiness and Success (by Amy
Morin)
6
How Women Rise: Break the 12 Habits Holding You Back from Your Next Raise, Promotion, or Job (by Sally Helgesen)
7
The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change (by Stephen R. Covey)
8
Getting Things Done: The Art of Stress-Free Productivity (by David Allen)
9
You Are a Badass: How to Stop Doubting Your Greatness and Start Living an Awesome Life (by Jen Sincero)
10
Building a Second Brain: A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential (by Tiago Forte)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'A Brief History Of Time' (by Stephen Hawking):
Recommendation
1
A Briefer History of Time (by Stephen Hawking)
2
The Universe in a Nutshell (by Stephen Hawking)
3
Black Holes & Time Warps: Einstein's Outrageous Legacy (by Kip S. Thorne)
4
The Grand Design (by Stephen Hawking)
5
Wrinkles in Time (by George Smoot)
6
Parallel Worlds: A Journey through Creation, Higher Dimensions, and the Future of the Cosmos (by Michio Kaku)
7
The Principia : Mathematical Principles of Natural Philosophy (by Isaac Newton)
8
The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory (by Brian Greene)
9
Billions & Billions: Thoughts on Life and Death at the Brink of the Millennium (by Carl Sagan)
10
The Fabric of the Cosmos: Space, Time, and the Texture of Reality (by Brian Greene)
_____________________________________________________________________________________________________________________________________
_________________
Recommendations for 'Critique Of Pure Reason' (by Immanuel Kant):
Recommendation
1
Groundwork of the Metaphysics of Morals (by Immanuel Kant)
2
Phenomenology of Spirit (by Georg Wilhelm Friedrich Hegel)
3
Being and Time (by Martin Heidegger)
4
Individuals: An Essay in Descriptive Metaphysics (by Peter Frederick Strawson)
5
An Enquiry Concerning Human Understanding (by David Hume)
6
100 Questions About God (by J. Edwin Orr)
7
Our Knowledge of the External World (by Bertrand Russell)
8
Difference and Repetition (by Gilles Deleuze)
9
Beyond Good and Evil: Prelude to a Philosophy of the Future (by Friedrich Nietzsche)
10
On the Genealogy of Morals / Ecce Homo (by Friedrich Nietzsche)
_____________________________________________________________________________________________________________________________________
_________________
Thank you for trying the recommender.
Exiting...