Research shows that depth and breadth of data is more impactful to machine learning model performance, than the cleverness of the algorithm. It is the computing equivalent of human experience.
This suggests that, when possible, you can improve predictive accuracy by expanding the dataset used to craft the predictive characteristics utilized in a machine learning model.
Consider this: There is a reason why physicians see thousands of patients during their training. It is this amount of experience, or learning, that allows them to accurately diagnose within their area of specialization. In fraud detection, a model will benefit from the experience gained by ingesting thousands of examples consisting of both legitimate and fraudulent claims transactions. Superior fraud detection is achieved by analyzing an abundance of claims data to effectively understand behaviour, and assess risk at an individual level.
At XTND, we have performed extensive research on different modelling techniques. Across a variety of use cases, it clearly shows the volume and variety of training data is more critical in prediction, than the type of algorithm used. This research, and similar independent research throughout the AI community, indicates that fraud models which are developed and trained using data from various insurance providers, will be more accurate than models that rely on a relatively thin dataset.