Contributory Data Lakes: A Key Consideration for Insurers in Their AI Initiatives

P&C INSIGHTS BLOG   |   January 23, 2024

The Gradient AI Team

Contributory data lake, contributory database, AI

It has been said that data is the new oil. If this is true, the insurance industry is very wealthy indeed! At its core, insurance is a data-driven business. Underwriters and adjusters depend on data from various sources to assess risk and price policies, and to resolve claims.

Artificial intelligence is increasingly integrated into insurance operations with data as the fuel that drives AI-based solutions in underwriting and claims. Models are only as good as the data they are trained on. Higher quality data translates directly into more accurate models that provide better predictions and results.

 

For example, if you train AI models only on your in-house data, they can make predictions based only on the outcomes your company has seen. On the other hand, when you combine your in-house data with high-quality data from other carriers, AI models become more generalized because they’ve seen a wider range of policies and claims outcomes. They can enable you to operate more effectively across a wider range of opportunities and circumstances.

 

Why Contributory Data Lakes are Vital to Insurance in the AI Era: a Real-World Example


Insurers have vast underwriting and claims data assets. However, this information is limited to the industries and geographies in which they operate, and more specifically, to their own clients only.

For example, suppose you’re a
Workers’ Compensation (WC) carrier who only writes business for construction companies in Arizona. You have extensive data on construction practices, weather, traffic, accidents, and numerous other factors related to the construction industry in Phoenix.

 

Now imagine you want to expand into Minnesota and provide WC coverage for construction companies there. Because you don’t have any experience providing construction companies with coverage in Minnesota, you will find it difficult to accurately assess risk in this new geography. This challenge is further compounded if you want to expand into other industries, such as providing WC coverage for janitorial companies. You lack the data to understand the risks associated with this business, so similarly it will be difficult to accurately assess and price risk accordingly. 

This is where the benefits of a contributory industry data lake or database come into play. Contributory data lakes leverage numerous insurers’ data, creating a collaborative and dynamic repository of information. These contributory platforms rely on the collective input and contributions of multiple users providing a far more robust data set than any individual insurer would otherwise have. Industry data lakes of this magnitude not only help insurance companies assess risk more accurately in the markets in which they already operate, but also enable them to move into new markets, add new products, and benchmark themselves against their peers. These industry data lakes help refine AI models to improve their accuracy.   

 

The Benefits of Gradient AI’s Data Lake


Over the years, Gradient AI has invested significant resources into our industry data lake. Currently, it houses tens of millions of structured and unstructured underwriting and claims records. All of the data is de-identified to ensure anonymity. Beyond the contributory sources of the data lake, we’ve further enhanced it by adding almost 100 third-party data sources, such as:

  • Medical, prescription, and laboratory data
  • Weather, crime, distance to EMS, and numerous other demographic data points
  • Psychographic data

    …and much more

We added these additional features that go beyond basic data points because they enable AI models to be more accurate predictors. Of course, ensuring the security of data is paramount and Gradient AI is SOC2 compliant and HITRUST certified. 


AI Models are Only as Good as the Data They are Trained On


As carriers look to drive efficiencies in underwriting and claims, AI-powered solutions have become more prevalent to boost underwriting and claims productivity. 

 

Yet not all data is created equally. When evaluating solution providers, it is critical to remember that a model is only as good as the data it was trained on. A vendor can have sophisticated models, but data is the “fuel” that powers these models, and if they don’t have an extensive training dataset, their AI models may not be as accurate as models trained on more robust data sets. 


To learn more about Gradient AI’s contributory industry data lake, schedule a 30-minute conversation with us today.




Share This

Share by: