How it works

Methodology

A precise account of how GaiaLab derives biological insights — every scoring formula, every filter, every design decision.

Analysis Pipeline

Every GaiaLab analysis runs through five sequential stages. Stages 1 and 2 are fully parallel across all sources.

1

Gene normalisation

Input gene symbols are normalised to HGNC approved symbols. Aliases (e.g. HER2 → ERBB2) are resolved before any database query. Invalid symbols are flagged and excluded from scoring but included in the report.

2

Parallel data fetch — 35+ sources

All database queries run simultaneously via Promise.allSettled(). No source blocks another. A timeout or API error in one source does not prevent results from the remaining 25. Each client returns partial results on failure rather than throwing.

3

Channel aggregation

Raw API responses are aggregated into 16 evidence channels by domain-specific aggregators. Each aggregator applies source-specific normalisation, deduplication, and confidence flags before passing data downstream.

4

Scoring and classification

Drug candidates are scored 0–100 across six weighted factors. Pathways are ranked by FDR-corrected enrichment p-value. Hypotheses are filtered by evidence quality and cross-deduplicated against input gene tokens.

5

6-agent AI synthesis

Six AI agents — Hypothesis, Critic, Evidence, Risk, Innovation, Synthesis — debate the structured data. Each agent receives grounded prompts seeded with scored outputs from stage 4, not raw database dumps. The Synthesis agent produces the final executive brief.

Data Sources

35+ sources queried per analysis. Coverage figures are per-gene, averaged across a 5-gene panel.

SourceDomainData typeAuth required
PubMed / NCBILiteratureCitation metadata, MeSH termsOptional (higher rate limit)
PMC Full-TextLiteratureJATS XML, quantitative extraction (IC50, HR, OR, n=)No
ChEMBLDrug bioactivityIC50, EC50, Ki, pChEMBL, mechanism of actionNo
OpenTargetsDisease associationGene-disease association scores by data typeNo
BioGRIDInteractionProtein-protein interactions, genetic interactionsOptional
STRINGInteractionFunctional association network scoresNo
UniProtProteinFunction, variants, subcellular location, PTMsNo
AlphaFold (EBI)StructurepLDDT per-residue confidence → druggability scoreNo
ClinicalTrials.govClinicalActive trials, phase, intervention, NCT IDsNo
OpenFDASafety / RegulatoryAdverse event counts, drug approval statusNo
KEGGPathwayPathway membership, module associationsNo
ReactomePathwayHierarchical pathway enrichmentNo
Gene OntologyFunctional annotationBP, MF, CC termsNo
DGIdbDrug-geneDrug-gene interaction types and sourcesNo
DisGeNETDisease-geneGene-disease associations with evidence scoreAPI key
DrugBankDrugDrug targets, pharmacokinetics, interactionsAPI key
OMIMDisease geneticsMendelian disease associationsNo (public API)
ClinVarVariantPathogenic/benign variant classificationsNo
Semantic ScholarLiteratureCitation graph, influential papers, open-access PDFsOptional
GTExExpressionTissue-specific expression, eQTL associationsNo
CPTACProteomicsProteogenomic abundance and phospho-state summaries exposed through GaiaLab modality adaptersNo
CELLxGENESingle-cellCell-state and compartment enrichment summaries exposed through GaiaLab modality adaptersNo
HMDBMetabolomicsMetabolite-linked flux-axis context exposed through GaiaLab modality adaptersNo
IntActInteractionCurated molecular interactions with MI scoresNo
cBioPortalCancer genomicsAlteration frequency, mutation-aware survival stratification in mapped TCGA cohorts with single-gene loss-associated subgrouping when discrete CNA is availableNo
Pathway CommonsPathwayMerged pathway graph from 22 pathway databasesNo
Sources marked "API key" fall back gracefully when credentials are absent. Proteomic, single-cell, and metabolite channels are exposed through GaiaLab modality adapters so the fusion layer remains available even when live programmatic access varies by source.

FDR-Corrected Pathway Enrichment

GaiaLab uses a hypergeometric test for gene set enrichment, then applies Benjamini-Hochberg (BH) multiple testing correction across all tested pathways.

Hypergeometric test

P(X ≥ k) = Σ C(K,i)·C(N-K, n-i) / C(N,n) for i = k to min(n,K) Where: N = genome background size (21,000 protein-coding genes) K = genes in pathway (from database annotation) n = genes in input panel k = overlap between input panel and pathway

BH correction

Raw p-values across all pathways are ranked ascending. Each pathway receives an adjusted q-value:

q_i = p_i · (m / i) Where: m = total number of pathways tested i = rank of this pathway (1 = smallest p-value)

Pathways are labelled by significance tier:

  • high — q ≤ 0.01
  • moderate — q ≤ 0.05
  • nominal — q ≤ 0.10
  • ns — q > 0.10 (not shown by default)

Only pathways at q ≤ 0.05 are included in the executive brief and drug scoring. Pathways at q ≤ 0.10 are shown in the full pathway panel with a "nominal" label. This stricter threshold (tightened from q < 0.20) limits the expected false-discovery rate to 1-in-10 rather than 1-in-5.

Citation Verification & Hallucination Detection

GaiaLab runs a three-stage evidence integrity pipeline on every analysis to ensure cited literature is real, relevant, and accurately represented.

Stage 0 — PMID existence check

All PMIDs produced by the AI synthesis layer are batch-queried against the NCBI PubMed E-utilities esummary API in groups of 50. Any PMID not found in PubMed's index is flagged as hallucinated and stripped from the result before display. The hallucination rate (hallucinated / total AI-cited PMIDs) is reported on the Trust dashboard.

Stage 1 — NLI entailment check

Claims from the analysis are verified against their cited abstract text using DeBERTa-v3-large (cross-encoder/nli-deberta-v3-large), a state-of-the-art Natural Language Inference model. The entailment score threshold is 0.5 — claims that score below this are flagged as weakly supported. Context window: 2,500 characters per passage, 400 characters per claim.

Stage 2 — ALCE-style cite metrics

Inspired by the ALCE attribution benchmark, GaiaLab computes cite-precision, cite-recall, and cite-F1 for each analysis:

cite-precision = claims with ≥1 supporting citation / total claims cite-recall = citations actually used / total citations provided cite-F1 = 2 × (precision × recall) / (precision + recall)

These metrics are shown on the Trust page. A cite-F1 ≥ 0.6 is considered well-grounded.

Multi-agent citation floor

Any insight produced by the 6-agent debate that has zero verified PMIDs is annotated with citationFloor: false and its evidence quality is capped at "moderate". A "⚠ No PMIDs" badge is shown on the insight card in the analysis output.

Relation-Aware Drug Scoring

Each drug candidate is scored 0–100 across six weighted factors, then classified into a tier and assigned a floor/cap based on regulatory status.

Scoring formula

finalScore = ( targetScore × 0.30 + // gene-drug target overlap in input panel clinicalScore × 0.25 + // phase, trial status, regulatory approval moaScore × 0.20 + // mechanism of action alignment contextScore × 0.15 + // disease context relevance pathwayScore × 0.07 + // pathway overlap significance safetyScore × 0.03 // adverse event profile (inverted AE count) ) Adjustments: Off-panel drugs (targetsInPanel = false): score × 0.45, capped at 40 FDA-approved drugs: skip × 0.45 penalty, skip cap, floor = 50 (guaranteed Tier II+) AlphaFold structural bonus: +0 to +10 (pLDDT ≥ 80 → +10, ≥ 70 → +6, ≥ 60 → +3)

Tier classification

Tier I

Score ≥ 70. Strong evidence. On-panel target, clinical data, context match. Shown prominently in all views.

Tier II

Score 50–69. Moderate evidence. Includes all FDA-approved drugs that pass context filter. Up to 3 shown by default.

Tier III

Score < 50. Exploratory. Collapsed behind toggle. Requires explicit expansion by the user.

Filters applied before scoring

  • Context relevance ≥ 40 required for off-panel drugs (≥ 30 for on-panel)
  • Clinical evidence score ≥ 15 required for off-panel drugs
  • Synthetic lethality only computed in oncology disease contexts
  • Duplicate canonical drugs resolved by highest repurposingScore

6-Agent AI Debate

GaiaLab uses six specialised AI agents that each receive structured, scored data — not raw database text. Each agent has a defined role and adversarial mandate.

H

Hypothesis Agent

Generates mechanistic hypotheses from gene-pathway-drug co-occurrence patterns. Revises hypotheses in response to Critic flaws (iterative debate round).

C

Critic Agent

Identifies confounders, alternative explanations, and evidence gaps. Flags hypotheses that lack direct mechanistic support. Seeded with live OpenTargets and ChEMBL bioactivity data.

E

Evidence Agent

Assesses citation quality, recency, and quantitative support from PMC full-text extraction (IC50, HR, OR, n= values). Assigns grounding scores per claim.

R

Risk Agent

Evaluates safety signals from FDA FAERS adverse event counts and contraindication overlaps. Penalises drug candidates with high AE burden in the disease population.

I

Innovation Agent

Identifies novel angles — repurposing opportunities, combination hypotheses, and underexplored targets. Seeded with active ClinicalTrials.gov recruiting trials.

S

Synthesis Agent

Integrates debate outputs into the executive brief. Applies the advisory-therapeutic normaliser to ensure claim-level confidence aligns with citation coverage. Produces the final PMID evidence ledger.

Provider failover order

AI calls attempt providers in order: DeepSeek → OpenAI → Google Gemini → Anthropic Claude. Each call is gated by a token-bucket rate monitor. If a provider's bucket is empty, it is skipped without error and the next provider is tried.

Confidence Tiers

Claim-level confidence is capped by citation coverage. AI-generated language cannot assert high confidence when the citation record does not support it.

ConfidenceRequirementDisplay
HighOn-panel target AND clinical evidence score ≥ 15 AND ≥ 6 PubMed citationsGreen border, "strong evidence" label
Medium2–5 citations OR off-panel with clinical dataBlue border, "moderate evidence" label
Low< 2 citations OR hypothesis onlyGrey border, "exploratory" label

Every cited claim includes a PMID. Claims without PMIDs are labelled "derived" or "hypothetical" and rendered with reduced visual prominence. This is enforced by the PMID evidence ledger, not by AI instruction — AI cannot override it.

Immutable Analysis IDs

Every analysis run generates a permanent ID of the form gl-{timestamp}-{8-char-hash}. This ID is:

  • Included in API responses and the analysis UI
  • Linkable as a permanent URL: https://gaialabai.com/analysis/{id}
  • Safe to cite in paper supplementary materials
  • Stored as an immutable JSON snapshot in data/snapshots/

Snapshot files record the exact gene list, disease context, all database responses, all scored outputs, and the AI synthesis. A snapshot can be replayed to verify that the same inputs produce equivalent outputs under the same database state.

Analysis IDs do not guarantee database state reproducibility — external databases update over time. For full reproducibility, include the analysis ID AND the snapshot file in supplementary materials.

Known Limitations

Database coverage gaps

Without paid API keys (DisGeNET, DrugBank), coverage falls to ~30/35+ sources. These gaps are disclosed in the analysis output and do not produce false confidence — missing sources are simply absent, not filled with hallucinated data.

AI synthesis is probabilistic

The six AI agents reason from structured data but can still produce plausible-sounding errors. All AI output is gated by the PMID evidence ledger — claims without citation support are demoted. Users should treat the executive brief as a hypothesis generator, not a clinical decision tool.

Small panels (< 3 genes)

Pathway enrichment and drug scoring are less reliable with fewer than 3 genes. The hypergeometric test loses power and synthetic lethality detection is disabled. Results for single-gene queries are labelled accordingly.

Non-human species

GaiaLab is optimised for human gene symbols. Mouse orthologs (e.g. Trp53) are partially supported via alias resolution but may miss sources that do not cross-reference species.

Not a clinical decision support tool

GaiaLab is a research intelligence platform. Outputs are not validated for clinical use and should not inform patient treatment decisions without independent expert review and regulatory-grade validation.