Recommendation Model Based on Expiry Date of Product Using Big Data Analytics

Abhisekh Kumar Singh and Maheswari Raja

School of Computer Science and Engineering, VIT Chennai

Azath Hussain

School of Computing Science and Engineering, VIT Bhopal University


Due to the expansion of the network, e-commerce has overgrown and has attracted a significant number of internet customers around the globe. In every e-commerce company, customers can search, compare, and choose the product they like and can also recommend the popular product for others. In e-commerce company

(e.g., Amazon, eBay, and Flip kart), variety of products were selected and recommended for purchase by reading customer reviews and comments. This results in the accumulation of a large amount of data that can be processed and analyzed more effectively. Large amounts of information are available on the internet in the form of ratings, rankings, reviews, suggestions, and comments about the product. Nowadays, the recommendation of the product gets populated through different social media channels (Facebook, Tw'itter, Instagram, etc.) apart from an e-commerce site. Big data comprise large and complex datasets which is challenging to process using traditional hands-on database system tools. There are many concepts related to big data. Earlier, there were three properties such as volume, variety, and velocity; later, two more dimensions such as value and veracity were added. In this context, variety indicates to the range of the data, volume denotes the size of the data, velocity illustrates the speed at which data get generated in e-commerce organization, value refers the assessment dimension of the data, and veracity shows the reliability toward the customer’s feedback. The remainder of the paper is structured w'ith a basic description of the proposed methodology, analysis of implementation, and result.

The key contribution of the proposed system is to deliver specific and accurate endorsements, and to meet the recommendation system requirements in real time which influence the overall performance of the system. Without adequate data and in the absence of big data, a traditional recommendation system cannot do its job in an efficient way. A recommendation system is coined by considering a large number of user data, including past history of purchase, statistical analysis such as descriptive statistics means measure of data set consisting of mean, median, mode, and inferential statistics means data visualization such as correlation, regression, testing analysis, ratings, and user feedback for making appropriate and successful recommendations. Filtering technique such as content-based (CB) infers the information retrieval from product, user-based refers to recommendation based on user’s choice, and collaborative filtering (CF) signifies the people having the same taste regarding product; it all plays a vital role in the recommendation system as illustrated through the flowchart in Figure 5.1.

Statement and Objective

The objective of this work is to provide product recommendations based on expiry date using CF and CB filtering. It also recommends the unsold product w'ith any offer/discounts to balance the profit/loss between the seller and the customer. It targets to minimize the unsold expired products using data analytics estimating 60%- 70% of progress.

Literature Survey

Table 5.1 show's the literature survey supporting the proposed work, which includes the complete analysis required for the development of the proposed system [1-5]. The inferences, drawback, and conclusion have been tabulated for various kinds of techniques ranging from Big Data, Data Mining, Review' Analysis, Recommendation System, Data Quality, Data Processing through e-commerce, Social Media Platforms.

Flowchart of product recommendation

FIGURE 5.1 Flowchart of product recommendation.

TABLE 5.1 Literature Survey





Recommendation for the social network using Big Data analytics

Performed analyzation via popular friendship mining in Social network.

Sometimes the recommendation may be wrong leading to wrong friendship.

Discovering popular users and

recommending friends.

Recommendation system for e-commerce sector

Introduced a priori algorithm

Sometimes they are difficult to set up for running.

Suitable for products showing repetitive purchase pattern.


recommendation of a system with review analysis

Hadoop framework

Less efficient

CF methods to measure the quality of the prediction.

Data quality in the big data processing: issues solution and open problems

Data quality

Less scalable

Analyze the solution in big data processing.

Big data analytics using Hadoop framing

Hadoop framework

Very sensitive due to large data sets sometimes.

Used CB filtering, CF, and hybrid filtering for the recommendation system.

Product recommendation system

FIGURE 5.2 Product recommendation system.

Product Recommendation System

The ratings/reviews from the customer are processed using the standard statistical analysis such as linear regression, testing analysis, and their modes. The output of the statistical analysis is subjected to various filtering techniques like collaborative and CB filtering. Based on the outcome of these filtering methodologies, the appropriate recommendation is provided to the new user from the existing customer. The complete flow of the proposed product recommendation system is illustrated in Figure 5.2.

Three types of recommendations were provided in the existing system based on classification such as CB filtering, CF, and hybrid filtering [6]. Using CB filtering product gets recommended based on the user’s likes and dislikes. In collaborative-based filtering, customer receives suggestions based on the same categories that are further classified into separate user-based sections [7]. The predicted performance in terms of CF relies on the rating of other similar products by the same customer in user-based systems, depending on the rating of the comparable product by various customers [8].

Attributes of the test data set

FIGURE 5.3 Attributes of the test data set.

The multiple attributes of the test data set are shown in Figure 5.3. In this method, the most frequently searched products of the customer are coined as keywords, and these keywords are used to display both the user’s preferences and quality of the products. The purpose of the proposed system is to calculate the customized rating of items for a user and, then, submit a personalized list of recommendations and recommend the most appropriate products to the user. Figure 5.4 illustrates the various modes in Statistics over a test data set.

User’s Preferences/ Choices

In this module, the preference of active users and former users is formalized in their corresponding keyword sets. An active user of this method refers to a current user who needs advice.

i. Active user preferences: Active user can set preferences by selecting and recording a grocery product of any category. It identifies the active user of the product based on the similarity of their preferences if the previous user and the active user have the same taste.

ii. Preferences of a former customer: A previous customer can give preference to any product and share an assessment of the product.

Keyword Classification

After reviewing the product, each item has been categorized based on comments such as positive or negative. If the customer’s review comment is positive for the product, then it could be recommended through social media such as Instagram, Facebook, Twitter, and many other advertising links [9,10]. But in case if any particular product has received any negative feedback/comments, then to balance the profit/loss of that specific product, an affordable offers or discount could be offered to that product.

Implementation of Statistical Analysis for Products

The various implementation methodology of statistical analysis of the product was illustrated in this section. Figure 5.5 refers to the correlation analysis such as

Illustration of various modes in statistics

FIGURE 5.4 Illustration of various modes in statistics.

Pearson’s product-moment correlation for comparing rating and helpfulness of the customer based on correlation analysis. Here df$score leads to the rating from the customer, and df$HelpfullnessNumerator signifies the helpfulness factor of the product to the customer; the p-value is the population value of the overall data set whose value varies from -oo to +oo. The proposed system ensures 95% confidence interval over the test data set.

One-Sided and Two-Sided T-Test of Data Sets

Figure 5.6 signifies the helpfulness of the product through one-sided testing process of the collected data which helps to identify the overall mean of the test data. In the existing system, there are two types of T-Test available such as one-sided T-test and two-sided T-test [11,12]. Figure 5.7 shows the ratings of the product through one-sided testing process of the collected data, which helps to identify the overall, mean of the test data.

Figure 5.8 signifies the helpfulness of the product through two-sided testing process of the collected data which helps to identify the overall mean of the test data (with one lakh data) ranging from -с» to +00. Figure 5.9 shows the ratings of the product through two-sided testing process of the collected data which helps to identify the overall mean of the test data -00 to +00.

T-test analysis with helpfulness

FIGURE 5.6 T-test analysis with helpfulness.

Two-sided T-test for helpfulness

FIGURE 5.8 Two-sided T-test for helpfulness.

Two-sided T-test for score

FIGURE 5.9 Two-sided T-test for score.

Linear Regression Model

A linear regression model is one form of the statistical analytical methodology which helps to determine the association existed between two variables such as dependent and independent variables [13,14]. Figure 5.10 signifies the outcome of a linear regression model for two independent variables such as product ID and rating which includes first quartile, median, minimum, and maximum value.

Experimental Assessment

As experimental data, real company sales data in a company from January 2018 to November 2019 are considered with approximately 4,000, the number of food products holding approximately 5,00,000 customers, while the number of orders was nearly 1,00,00,000 and it was analyzed using the R-studio technology. Table 5.2 demonstrates about the principal component analysis (PCA) which converts linearly correlated variables into linearly uncorrelated variables. Precision, coverage, recall are a part of PCA which calculates a statistical value on the basis of data sets. The graphical representation of the PCA is shown in Figure 5.11.

Effects of Recommendation System

CF and CB systems are two typical recommendation engines used in the existing system. CF systems recommend goods by contrasting user preferences to other specific users. CF multiples the dimension of the data in terms with peers that are highly

Outcome of linear regression model

FIGURE 5.10 Outcome of linear regression model.

TABLE 5.2 PCA-Result Analysis


Precision (%)

Recall (%)

Coverage (%)

Baseline user-CF




User-based CF of [3]




User-based CF (2)




Item-based CF (2)




Associate prod (3)




















Graphical representation of PCA

FIGURE 5.11 Graphical representation of PCA.


Comparison Between User Ratings and Review Score

Product Id

User Id


Review Score





























correlated and then suggest the most popular items among peers to the customer. CB, on the other hand, recommends products based on the characteristics of the service instead of the purchase history or reviews of either user [15]. CB system aims to balance the user profile with the features of the product and suggest products that are strictly compatible with the user profile. Thus, the data collected for product recommendation through various analysis and filtering calculate a recommendation value over customer, and customerj which is indicated as rtj in Equation (5.1) and peer-to-peer sale (i.e., recommendation by point-user similarity) is denoted by Stj in Equation (5.2).

Generally, customers prefer recommended product based on ratings. The comparison between user rating and review score is tabulated in Table 5.3.

Recommendation for Ratings and Reviews of the Customer of Products

Customer satisfaction rating is often a significant indication of the success of the customer management program of the product. Generally, the customer satisfaction rating is evaluated on a five-point scale (1 being “poor” and 5 being “happy”) in all the following figures. The graphical analysis of recommendation for customers rating is shown in Figure 5.12.

A recommendation system or engine is a subclass of knowledge filtering system which seeks to predict the “rating” or “preference” of a product. To customize the recommendations for the new users, the review and rating from the existing customer are made mandatory. Figure 5.13 shows the graphical representation of the user’s recommendation based on studies. After reviewing the product through different ways of statistical analysis, if the product is close to the expiry date, the proposed methodology recommends a product with an offer/discount to balance the profit/loss between the seller and the customer.

Advantages of the Recommendation System

• Personalized recommendation: The system will take current user preferences and generates recommendations on that basis. Each user shall be provided with a separate list of items according to their requirements.

Recommendation for customers

FIGURE 5.13 Recommendation for customers.

• Calculation time reduction: With the help of Hadoop and map-reducing algorithm, the system becomes more efficient in terms of calculating time than the existing system.


This chapter deals with the product recommendation scheme, depending on its expiry date through different methods such as CB or CF technique which is further split into product-based and user-based approaches. Thus, the final product which is advertised in the social media through different recommendation engine is targeted to the suitable customer, which will be well-enforced through different types of platform such as modes of statistics, testing analysis, and regression model followed by ratings and reviews of the customer. This recommendation system provides a substantial increase in a sale, which minimize the unsold expired products using data analytics estimating 60%-70% of progress.


  • 1. S.S.R. Abidi, and J. Ong, 2000, Automated data clustering based on a synergy between self-organizing neural networks and к-means clustering techniques. Proceedings of IEEE TENCON, Kuala Lumpur, pp. 568-573.
  • 2. G. Adomavicius, and A. Tuzhilin, 2001, Using data mining methods to build customer profiles, IEEE Computer, 34, pp. 74-82.
  • 3. J.L. Herlocker, J.A. Konstan, and J. Riedl, 2000, Explaining collaborative filtering recommendations, Proceedings of the ACM 2000 Conference on Computer Supported Cooperative Work, Philadelphia, pp. 241-250.
  • 4. B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, 2002, Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery, 6, pp. 61-82.
  • 5. P.S. Yu, 2002, Data mining and personalization technologies. Proceedings of the Sixth International Conference on Database Systems for Advanced Applications, Hsinchu, Taiwan, pp. 6-13.
  • 6. T. Mingdong, Z. Tingting, L. Jianxun, C. Jinjun, 2018, Cloud service QoS prediction via exploiting collaborative filtering and location-based data smoothing, Concurrency and Computation-Practice & Experience, 27(18). pp. 5826-5839.
  • 7. P. Nikolaos, and P.-В. Andrei, 2017, Adaptive sentiment-aware one-class collaborative filtering, Expert Systems with Applications, 43, pp. 23-41.
  • 8. T. George, and S. Merugu, 202005 , A scalable collaborative filtering framework based on co-clustering, International Conference on Data Mining (ICDM), Houston, TX, USA. pp. 625-628.
  • 9. N.M. Khanian, and M. MohdNaz’ri, 2019, A systematic literature review on the state of research and practice of collaborative filtering technique and implicit feedback. Artificial Intelligence Review, 45(2), pp. 167-201.
  • 10. T. Steven, 2016, The economics of reputation and feedback systems in E-commerce marketplaces, IEEE Internet Computing, 20(1), pp. 12-19.
  • 11. Y. Dong, S. Liu, and J.C. Chai, 2016, Research of hybrid collaborative filtering algorithm based on news recommendation, 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Datong, China pp. 898-902.
  • 12. C. Langcai, L. Zhihui, and L. Yuanfang, 2017, Research of text clustering based on improved VSM by TF under the framework of Mahout, Proceedings of the 29th Chinese Control and Decision Conference CCDC 2017, Chongqing, China, pp. 6597-6600.
  • 13. G. Shani, A. Gunawardana, F. Ricci, L. Rokach, B. Shapira, and P. Kantor, 2011, Evaluating Recommendation Systems in Recommender Systems Handbook, Boston, MA: Springer.
  • 14. Y. Fan, Y. Shen, and J. Mai, 2008, Study of the model of e-commerce personalized recommendation system based on data mining, International Symposium on Electronic Commerce and Security, Guangzhou, China, pp. 647-651, 3-5 August.
  • 15. A.A.C.G. Karuna, and C. Gull, 2014, A clustering technique to rise up the marketing tactics by looking out the key users taking Facebook as a case study, IEEE International Advance Computing Conference, Gurgaon, India, pp. 579-585.
< Prev   CONTENTS   Source   Next >