Extracted Claims
VALIDATED DATAStructured scientific assertions extracted from paper abstracts using multi-LLM analysis with rigorous quality filtering. Each claim is a single factual statement that preserves the original authors’ hedging language (e.g., “may regulate” stays “may regulate” — never upgraded to definitive). Claims are typed into 12 categories (gene expression, protein interaction, drug efficacy, splicing event, biomarker, etc.), scored for confidence (0–100%), and linked to both their source paper and relevant molecular targets via 200+ alias patterns. The extraction pipeline uses a two-layer quality gate: disease-relevance filtering removes non-SMA contamination, and word-boundary matching prevents false target links. Click any row to see the full provenance chain: paper title, PubMed ID, abstract excerpt, extraction model, and metadata.
▶How does Claim Extraction work?
Each PubMed abstract is processed independently by two LLMs — Claude Haiku and Gemini Flash. Claims are extracted as structured objects with a subject, predicate, object, and claim type.
Claim types
- mechanism — molecular mechanism (e.g., pathway activation, protein interaction)
- therapeutic — drug or intervention effect on disease phenotype
- biomarker — association between molecule and disease state or severity
- expression — differential expression finding from omics data
- genetic — variant, mutation, or gene-disease linkage
Confidence scoring
Confidence = model agreement score. If both LLMs extract the same claim (subject + predicate + object match), confidence is high (>0.8). Single-model extractions receive a lower prior (0.4–0.6). Patent-sourced claims are capped at 0.3. Claims below 0.25 are filtered out.