Martin et al

Martin et al., (2018) in their study talk about the big data methods that can be employed to analyze or measure risk in giving credit as home loans. Since the dataset is large, Monte Carlo experiments with known algorithms and techniques is used. To calculate the credit risk, incremental contribution using linear mixed model is used. The study showed that large dataset requires big data techniques to interpret and give unbiased results or estimators. It was also found that the loss from defaulting of loan could be found out through optimal evaluated method, this method can also be used to risk of giving credit to a customer. Batmaz et al., (2016) studied about the factors that determine deposit pricing. This study utilized big data techniques and data mining methods on bank data of customers provided by commercial banks. The models were made using generalized linear model, Multivariate adaptive regression splines, Support vector regression, artificial neural network, Random forest and Decision tree. It was found that it is important to consider the account specific information and characteristics of the bank customer to make meaningful inferences on the determination of deposit rates. It was found that customers who have long term relationship with a bank, enjoyed higher deposit rates as a bonus for being their loyal customer. It was also inferred that the location of the bank was also a significant factor which determined the deposit rates. Etaiwi et al., (2017) studied the bank customer’s behavior using big data and related tools. Here a competitive study is done on Support Vector Machine (SVM) and Naïve Bayes (NB) which are two of the classification techniques under the Machine Learning Library (ML lib). The classification is done on data of customers of a bank called Santander which is located in Spain. The dataset used for study had the behavioral and personal information of the bank customers. The test set had 1 million record and the training set had more than 13 million records. Data was cleaned before the test was performed using the classifiers. It was found that the recall, precision and F-measure given by Naïve Bayes was superior to Support Vector Machine. It was also inferred that for prediction problems, multi class classifiers had an edge over binary classifiers. Niloy et al., (2018) in their journal study on how decision tree algorithm and Naïve Bayes algorithm was applied to dataset which contained customer information of a bank and to find out non-credible and credible credit card holders of a bank. Monte Carlo computational techniques were used the above said algorithms were applied to the dataset making a predictive model and for data mining. It was inferred that Naïve Bayes classifier has edge over decision tree algorithm and the kind of classification technique has a significant influence on the result that is obtained after performing the tests. Rahman et al., (2018) conducted a study using dataset of bank customers to find out the customer behavior pattern. The customers were business men, corporates and people from other organizations. The study was aimed at finding out which among kNN, artificial neural networks classifier and decision tree classifiers gave better results to help find out customer behavior pattern. 8 attributes of the customers that gave the maximum information were taken for the test. The specificity, accuracy and sensitivity of the models were found out. It was inferred that artificial neural networks performs better that decision tree and kNN. Bhichesthapong et al., (2016) in their study performed a study on non-Business intelligence specialist using dataset from bank customers. This was done so as to find out if non-Business intelligence people could infer meaningful conclusions from the big datasets. The usefulness of a self-service Business intelligence is studied. The non-specialist were able to make meaningful conclusions from self-service BI. The various pain points were recorded and a data cube was developed using SQL software. It was inferred and concluded that even non-specialist were able to make time-critical reports using the data cube. Manoj Reddy (2018) published a journal on how machine learning using Big data could be beneficial to managers in taking decisions, how it can add value for the smooth operation of the bank and how it will benefit risk management. There are two learning approaches incorporated in this study of machine learning. 1. Supervised learning approach which consists of classification and regression 2. Unsupervised learning approach which consists of clustering and association. It was found that the incorporation of machine learning in banks could help in Anti money laundering monitoring, risk based credit approval, loan default prediction, risk forecasting models and consumer loan risk segmentation. The article concludes by saying that machine learning algorithms and techniques is still evolving and it will be incorporated in all the fields in the near future. Radhika Kale (2018) in her journal titled “Student Performance Prediction for Education Loan System” talks about how Big data analysis in education and banking industry could help banks predict whether a student who has taken educational loan will complete the course successfully and repay the loan without defaulting. The student’s performance is predicted based on base predictor and ensemble predictor which uses data set. The past and the present performance of the student is taken into account for the prediction. The paper also states that ensemble-based technique can make predictions based on the progress that the student is making. The study concludes by saying that prediction using big data will be beneficial for both the bank and the student. Migueis et al., (2017) made study on Big data and how it can be used to maximize the response of customers in a bank for the marketing activities that it is doing. The target customers for the campaign was predefined and the data mining model was supported by random forest method. In this study a comparison is made between oversampling method and undersampling method to find the most suitable specification. The dataset taken into consideration for study also had demographic variables of a customer which would help in inferring meaningful information. The study concludes by saying that random forest method helps to effectively predict customer pattern and target customers and marketing campaign can be focused on the inference. Undersampling algorithm along with random forest showed very high prediction performance. Mitik et al., (2017) did a research on how data mining techniques could be effectively used for the marketing of banking products and services. This study was done in Japan. The classification was made on whether a person is interested in the bank product or not. This model of classification was a hybrid classification. Based on this classification, clustering was done, and a suitable way of marketing technique was implemented. The data taken was from real life and the results show that the right channel and product for the customers could be found using data mining techniques with improved accuracy. By incorporating the data mining, there is huge cost cutting i.e., there is increase in profit/cost ratio. Farooqi et al., (2017) in their research explain the significance of data mining in Banks and how it can be used to generate value to the industry. The study states that meaningful information could be generated from big data and could be used for customer relationship management, cutting down costs, bring out focused and effective marketing campaigns etc. Data mining would also contribute to association, risk management, retail management, asset management, portfolio management and investment banking. The authors also say data mining could also change the way the industry is functioning.