Part of the Kaggle competition sponsored by Santander
The training data contains 4992 columns and 4468 rows. This is a absurd dataset in a way that it contains more features as compared to the rows. Each row signifies a customer and their transactions. There are numerous columns but the only informaton available about them is that these are transactions done by the customers.
Santander wants to improve their personalized services provided to the customers based on these transactions done. The company wants to predict the amount or value of the transaction that could be done by the customer. Predicting these transactions will help the company provide a simple but personal service to each of their customers.
Code containing the EDA and predictive analysis can be found here