Charel Theisen

MSc Data Science Student

Financial Product Recommendation with Spark & Random Forest

In the Big Data Module Benedikt W. and I built a financial product recommendation Tool with PySpark. To recommend products, we developed a random forest model with over 90% accuracy.

Lessons learned and future work:

Data imputing important to not lose too many observations
– Recommendations by Age or Gender could raise potential ethical issues
– Grid-Search is computationally intense but did not result in a significant performance increase
– Only considering personal data without looking at the product portfolio made our model perform worse
– Results showed that prediction for imbalanced datasets i.e. sparse product distribution is problematic
– The accuracy of unpopular products will automatically be high when the model solely not recommends a product
– Future work could tackle this sparsity using SMOTE

We also have a poster of our work, contact me if you are interested.

DROP A COMMENT

Your email address will not be published. Required fields are marked *