Fact-verified AI for Financial Intelligence with FactBlock and DeepVerify

Fact-verified AI for Financial Intelligence with FactBlock and DeepVerify

Factagora

Jun 14, 2024

A Source-of-Truth Approach Using SEC Filings and Financial Statements to Provide Reliable Insights into Insurance Underwriting Performance

Overview

Factagora is transforming how financial professionals work with complex data.

By combining FactBlock and Source of Truth (SoT) technologies, we’ve turned generative AI into a faster, more accurate, and fully verifiable analysis tool.

In a real-world case study in the U.S. insurance industry, we structured five years of SEC filings and macroeconomic indicators into FactBlocks, aligned them with NAIC’s core underwriting metrics, and established a domain-specific analytical framework.

Supported by DeepVerify, the system produced outputs that were source-backed, explainable, and trustworthy.

The result: a Z-score of +1.31 across accuracy, relevance, and effectiveness—significantly outperforming Naive RAG with GPT-4o.

Factagora enables high-trust industries to advance with greater speed and confidence.

The Problem

Public financial documents, such as SEC filings, are difficult for investors to work with. A single 10-K report can exceed 200 pages, making it challenging to locate and interpret essential information. These filings are also highly complex—filled with domain-specific language and key metrics dispersed across multiple sections.

Even advanced general-purpose large language models (LLMs), such as GPT-4o, often struggle to interpret them accurately. They frequently generate plausible but incorrect or fabricated answers—so-called “hallucinations”—which undermine trust and compromise the quality of financial decision-making.

The Solution

Factagora addresses these challenges through a dedicated fact-checking layer built on three components: FactBlockSource of Truth (SoT), and DeepVerify. By integrating this layer into any LLM-based AI system, organizations can generate outputs that are not only accurate, but also traceable, explainable, and grounded in verifiable data.

At the foundation, FactBlock extracts and structures key data points from SEC filings and macroeconomic sources—such as 10-K, 10-Q, 8-K, and FRED—turning unstructured text into modular, machine-verifiable knowledge units. These FactBlocks are compiled into a Source of Truth (SoT): a reusable, explainable knowledge base that provides consistent grounding for future queries and analysis.

Built on this SoT, DeepVerify acts as a factual guardrail. It validates AI-generated responses against the SoT to ensure that each output is based on real data—not assumptions. DeepVerify supports transparent, explainable answers with traceable citations, clear reasoning paths, and confidence scores, and allows teams to define custom domain-specific verification logic—enabling financial analysts, credit assessors, and institutional decision-makers to maintain rigor, trust, and control within automated workflows.

Case Study

Factagora’s approach was applied to a real-world case study in the U.S. insurance industry, focusing on the underwriting performance of major insurance companies.

Eight firms—including Allstate, MetLife, and Progressive—were selected. Five years of data were extracted from SEC filings (10-K, 10-Q, 8-K) and structured into FactBlocks. Macroeconomic indicators from FRED—such as unemployment rate, real GDP, CPI, and the 10-year Treasury yield—were also incorporated to quantitatively link company performance with broader economic conditions. Together, these elements formed a domain-specific Source of Truth (SoT).

Underwriting performance was evaluated using five core metrics defined by the National Association of Insurance Commissioners (NAIC):

  1. Combined Ratio

  2. Loss Ratio

  3. Loss Reserves

  4. Expense Ratio

  5. Underwriting Standards

Factagora aligned the FactBlock schema and analytical framework with these metrics to ensure consistent, standards-driven evaluation.

large language model (LLM) was used to automatically generate analysis questions based on the SoT. For each metric, both general and specific questions were created. For example, regarding the Combined Ratio:

“How has this ratio evolved over the past year, and what does it imply about underwriting profitability?”

“How has reinsurance accounting affected the loss and combined ratios?”

These questions enabled both qualitative and quantitative AI-driven analysis grounded in the SoT.

Finally, Factagora applied DeepVerify to evaluate whether the generated responses were truly grounded in source data and satisfied requirements for explainability and domain-specific reliability.

The Result

To evaluate response quality, Factagora’s DeepVerify was compared with a Naive RAG pipeline (using GPT-4o) using the same question:

“How has the combined ratio evolved over the past year, and what does this indicate about the company’s underwriting profitability?”

DeepVerify was evaluated in two iterations (ver1 and ver2), with ver2 designed to further enhance accuracy and overall performance.

A Comparative LLM-Judge model evaluated each response on factual accuracy, explainability, and relevance:

  • Naive RAG: Z-score –1.3 (below average)

  • DeepVerify(ver2): Z-score +0.7 (well above average)

In a broader evaluation, DeepVerify (ver1ver2), and Naive RAG with GPT-4o were assessed across the same three key metrics. The results showed the following Z-scores:

Table1. Z-score comparison across accuracy, effectiveness, and relevance

Metric

DeepVerify (ver2)

DeepVerify (ver1)

Naive RAG ( with GPT-4o)

Accuracy

0.57

0.72

-1.29

Effectiveness

0.79

0.59

-1.37

Relevance

0.78

0.59

-1.37

Total

0.74

0.63

-1.37

DeepVerify consistently outperformed baseline LLMs across all metrics—achieving positive Z-scores in every category, while baseline models scored significantly below zero. This demonstrates that Factagora’s fact-checking layer enables more accurate, explainable, and domain-aligned analysis than general-purpose models can achieve on their own.

This case study provides strong empirical evidence that Factagora delivers a trustworthy, scalable, and high-precision AI analysis framework, purpose-built for complex, high-stakes domains like financial reporting and risk evaluation.