Book Recommendation
Engine

In present generation of Computing revolution, recommendation systems are integral part of any intelligent information systems. e.g. Search engines (Google, Bing, Yahoo), Netflix, Amazon, YouTube and so on, recommends the article or entities which might interest users. For a system to be intelligent, it needs have informative data about user and about the entities he/she was interested in. In this project, we have developed a Book recommendation engine (stand-alone) which is used to recommend books using the User profile and User rating details. The rating system is designed with two recommendation algorithms, 1. Collaborative filtering and 2. User demographic profile (User location and age).

Project Team


Lakshmanan Ramu Menal

Prasanna Kumar Rajendran

Senthil Kumar Karthikeyan

Approach

Collaborative based Recommendations: Collaborative based recommendation engine is basically build based on the collaboration of different user’s contribution on a book. The parameter which is taken into consideration is the different user’s rating.

• We have grouped all the books each user has rated, for all the users, and sorted them in descending order of the ratings. We have ignored the low rating book that the user has.
• Now, we get ordered pairs of interest for books each user has. So, we have dropped the user information from the pairing.
• Then, we calculated the similarity between every book with every other book which are rated by the same user.
• Finally, we combined the similarity score we calculated with for each book with other book we have calculated in the previous step.
• We have implemented the Collaborative based recommendation engine using Hadoop MapReduce and the programming language used was Java.

Recommendations filtering based on Demographic data:


• We grouped all the data in such way that we can perform clustering based on location or country the user belongs to. We used the clustering algorithms for clustering the data based on the Country.
•We can even include age of the user information to cluster data. This shows the user’s recommendations which falls close to his/her country cluster. This reduced the number of data sent for processing in the second stage of engine, where the Collaborative algorithm resides. In a way improved the overall computation speed
• We have implemented this using the Hadoop MapReduce, as this going to be a one time computation, in Java.

System Implementation

Components:


• Data Preprocessing - Rating normalization,scaling, grouping age, book,country and the rating of the user in one file and removing nulls
• Core System - Finds the Item-Item similarity in three scales
      1. Correlation
      2. Cosine Similarity
      3. Jaccard index

• Consolidating the results- The output from the Core system is country based and age based. The scores are based on every book with everyother book. For the user recomendation, we are processing the output from the core system and the user read books to recommend the books which he has not read.

System Architecture:

Motivation:

We tried to check how this recommendation engine works in the real-world scenario by giving one of our friend’s information as a new user profile detail in the dataset and also his ratings to the books which he has already read from the dataset. Interestingly, recommendation engine suggested some books which was aligned to his interest based on the rating details which he provided. This was very useful and interesting about our project.


Friend's Data:

19930711&&&&& &&&&&"067003097x"######0 "080652121X"######1 "0060802162"######0 "052341899X"######0 "0140035206"######1 "344275500X"######1 "0671741896"######-3&&&&&usa""~~~~~""45"""
Friend's recommendation output:

"19930711" "0393320359","0395925037","037550771X","0898868300","0394713818","0440224764","0449223345","0061000043","0425147363","0345339711","037321801X","0446517984","1570629048","0399501487","0440235596","0316782262","0812520475"

Dataset:

We are using the Book-Crossing Dataset which was mined by Cai-Nicolas Ziegler, DBIS Freiburg, from the Book-Crossing Community. This dataset contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books. This dataset is freely available for research purpose from here

Data Description: The Book-Crossing dataset comprises 3 tables in comma-separated values (CSV) files.
BX-Users Contains the users. Note that user IDs (‘User-ID’) have been anonymized and map to integers. Demographic data is provided (‘Location’, ‘Age’) if available. Otherwise, these fields contain NULL-values.
BX-Books Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (‘Book-Title’, ‘Book-Author’, ‘Year-Of-Publication’, ‘Publisher’), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavors (‘Image-URL-S’, ‘Image-URL-M’, ‘Image-URL-L’), i.e., small, medium, large. These URLs point to the Amazon web site.
BX-Book-Ratings Contains the book rating information. Ratings (‘Book-Rating’) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

Framework:

The entire Recommendation Engine in this project was built in Hadoop MapReduce framework using Java. Environments: For development, we used Cloudera. For testing, we used UNCC Hadoop DSBA cluster. For implementation and demo, we used Amazon AWS Cluster.

Illustrative results and Examples:


Data Preprocessing


File1- After Preprocessing

"100027"&&&&& &&&&&"0448425831"######-7\s&&&&&canada"~~~~~unknown "100028"&&&&& &&&&&"185967318X"######3\s&&&&&jersey,"~~~~~unknown "100029"&&&&& &&&&&"0140219854"######3\s"3498020862"######-5\s"0446523747"######-7\s"0312983263"######-7\s"0312954468"######-7\s&&&&&germany"~~~~~unknown "10003"&&&&& &&&&&"068483068X"######3\s"0743446593"######3\s&&&&&usa"~~~~~"20" "100030"&&&&& &&&&&"0744552192"######-7\s"0679735909"######-7\s&&&&&usa"~~~~~"15"


Core System


File2- After Core system from Correlation based on Country

australia","0006476007" "0330243829" 1.0,"0749309423" 1.0,"0733613675" 1.0,"0140031499" 1.0,"0749316063" 1.0,"0552103721" 1.0,"0552125695" 1.0,"0099245027" 1.0, australia","0061030015" "0345466810" 1.0,"0446608815" 0.8,"0312421184" 0.7,"0224018256" 0.0,"1585672939" 0.0,"0099771519" -0.7, australia","0142003581" "0060930535" 1.0,"0061015725" 0.7,"0224018256" -0.7,"0385319452" -1.0,"0786868716" -1.0,"0767903862" -1.0,"0671024248" -1.0, australia","0312924801" "0446610801" 1.0,"0553258877" 1.0,"0553280341" 1.0,"0886777712" 0.6, australia","0340613696" "000649983X" 1.0,"0450411435" 1.0,"0007140676" 1.0,"0099281082" 1.0,"0449007251" -0.9,"0860074382" -1.0,"0099312514" -1.0, australia","0385720106" "0060934417" 1.0,"033031582" -0.7, australia","0399131493" "014043223X" -1.0, australia","0452285011" "0330361163" 1.0,"0140289690" 1.0,"061328125X" 1.0,"0452282152" 1.0,"1565122178" 1.0,"0790008696" 1.0,"0571206484" 0.9,"0091882087" -1.0, australia","0575049804" "0887307876" -1.0,"0330349678" -1.0,"067976402X" -1.0,"0971880107" -1.0,


File2- After Core system from correlation based on Age

15","0440413141" "014131088X" 1.0, "16","0312990456" "0590962736" 1.0, "18","0553264613" "0515126772" 1.0,"0446310786" 1.0,"0380776839" 1.0,"0452281458" 1.0, "18","0671870114" "0671737821" 1.0,"0671871005" 1.0,"0671744216" 1.0,"0671776800" 0.9, "18","0671871005" "0515120006" 1.0,"0671737821" 1.0,"0671870114" 1.0,"0671737791" 1.0,"0671776800" 0.9,"0671744216" 0.9, "18","0684856093" "0312195516" 1.0,"0441005993" 1.0,"0064400581" 1.0,"084233226X" 0.7,"0061057894" -1.0,"0842329250" -1.0,"0842329277" -1.0,"0553274295" -1.0,"0842329269" -1.0, "19","0345413350" "044651652X" 1.0, "20","0671534734" "0689813953" 1.0,"0679781587" 1.0, "20","1558746161" "1558747613" 1.0, "21","0060930535" "0684801523" 0.8,"0804114986" -1.0,"0316777730" -1.0, "21","0375826688" "0971880107" 1.0,"1840720050" 0.0, "21","0380789019" "0679781587" 1.0, "21","0399149325" "0060931418" -1.0,"0451526341" -1.0, "22","0099771519" "0385504209" 1.0,


Consolidating Result


File2- After Core system from Correlation

"100046" "0394584112","5557121005","0312195516","0440407028","0375759069","0805038221","0805036504","0375761381","0140177396","0831700203","0380703122","0805059555","0590406205","0515130389","0064405176","0449907481","0671021001","0140384510","0684853507","0312291639","0345447840","1931561648","0553268880" "100181" "0449006344","0316769487","0842342702","0380709244","0553289691","0451526341","0425173534","0679445358","0451201744","0060512822","0671026682","0345354648","0451200861","0425192032","0345428811","0064402053","0449216411","0399145885","0802132758","0425052478","0425162486","0380703890","0345348036","0553575953","0679735666","0425165566","0609600672","0316788228","0743418220"


Performance Evaluation:

The result below shows the output of correlation score from each of the scale and more appropriate recommendations are based on Correlation scale and Cosine to some extent. Jaccard Index doesn't take the actual ratings into account. It only takes the number of similar rating between to items and number of union set elements of two items. Moreover, The computation graph below shows that the Correlation scale runs faster than the other two method. Also, we conducted an recommendation experiment on a user, who is our friend and found that Correlation scale recommended book more similar to the choice of my friends expectations. So, as a conclusion Correlation scale method for computing the similarity is better than other two. So, as an enhancement, we are trying to merge these three scores together for better system.

Book Jaccard Cosine Correlation
743456335 0.3 0.4 0.7
515128546 0.6 -0.2 0.4
385314388 0.7 0.9 0.9
380792028 0.6 0.9 0.9
312995423 0.4 1 1
380762595 1 1 1
553569910 1 1 1
684818868 0.6 1 1
006098824X 0.3 1 1
044023722X 0.5 1 1
439136350 0.3 0.5 0.6
006098824X 0.2 0.4 0.9
385479565 0.2 0.7 0.9






Accomplished:


• Definitely will accomplish: As we proposed, we have successfully implemented the Book Recommendation Engine using the Collaborative + Demographic recommendation model from the scratch.
• Likely to accomplish: As we mentioned, we also tried to implement the recommendation engine using content-based recommendation engine but to time constraints we were not able to design it completely.
• Would ideally like to accomplish (in future): A responsive UI for the system and showing the recommendation in a webpage and storing any new user data in the database for adding him to the existing dataset we have and include his data for further analysis and recommendation computation.