Case Study Academic Research

Accelerating rare disease research by reasoning over 12 years of unanalyzed records

A research consortium applied LLM reasoning to 12 years of accumulated patient records that had never been fully analyzed — surfacing three novel disease associations and reducing hypothesis generation time from months to days.

80% → 100%

Historical patient data now under active analysis

3

Novel disease associations identified in first 90 days

12 years

Longitudinal data now fully reasoned over

Months → days

Hypothesis generation timeline

340,000

Participant records now fully covered

4 publications

Research papers submitted based on findings, Year 1

The Organisation

Academic Medical Research Consortium · Academic Research

The Challenge

The consortium had accumulated 12 years of longitudinal patient data from 340,000 participants across 6 research institutions. The data existed in heterogeneous formats across multiple systems, and the cost and complexity of applying meaningful analysis to the full dataset had meant that most research drew on pre-specified subsets of the total record. An estimated 80% of the accumulated data had never been included in a primary analysis. Rare disease research was particularly constrained — the small patient populations required to identify statistically significant patterns demanded full cohort analysis that was computationally and financially out of reach.

The Approach

LLM reasoning applied to the complete 12-year longitudinal dataset. The model was configured to identify temporal patterns, cross-condition associations, and rare event clusters across the full record — surfacing candidate hypotheses for researcher review.

"We had been doing research with one hand tied behind our back for 12 years. The data was there. The answers were there. We just couldn't afford to look at all of it."

Director of Research Informatics

Key Finding

The three novel disease associations identified in the first 90 days included a previously unreported correlation between a common medication class and reduced incidence of a rare autoimmune condition — a signal present in the data for 9 years but requiring the full longitudinal dataset to achieve statistical significance. The finding has since been submitted for peer review and is in Phase II validation.

Results at a Glance
Historical patient data now under active analysis 80% → 100%
Novel disease associations identified in first 90 days 3
Longitudinal data now fully reasoned over 12 years
Hypothesis generation timeline Months → days
Participant records now fully covered 340,000
Research papers submitted based on findings, Year 1 4 publications
Get in Touch

Talk to us about your data.

Tell us about your event stream and we'll show you what full LLM reasoning coverage looks like for your environment.

Or book a call directly →