Evidence engines, RocSite Discovery

Two engines, one evidence backbone.

Both engines retrieve from the same indexed corpus using the same hybrid retrieval chain (lexical + dense-vector + cross-encoder rerank), indexed against ~29.6M medical documents (SPECTER2-embedded medical subset of a 48M+ document corpus). The difference is in the synthesis layer and the output shape — built for two distinct audiences.

Caliper

Audit-grade evidence synthesis

Structured findings with primary-venue grounding, confidence statements, and verifiable citations — on hard research questions, in under a minute.

Output: Finding · Confidence · Strongest point · Key uncertainty
Primary-venue grounding (NEJM, Lancet, JAMA, JACC, Circulation, EHJ, Nature Medicine)
Latency: 25 – 55 seconds per question
Built for: regulatory-evidence teams, medical-affairs leads, research directors

Discovery

Research-grade literature synthesis

Prose synthesis with mechanism-level context, inline citations, and methodology-pending transparency — for literature reviews and research workflows.

Output: prose synthesis with numbered inline citations
Methodology badge when validation gates (literature, CI, confounding, FDR, replication) are not yet asserted
Latency: 8 – 25 minutes per question (deeper synthesis)
Built for: medical-affairs research teams, evidence-review committees, literature reviewers

The evaluation

Five fresh research questions. Zero hallucinations.

Internal validation, May 2026. Each engine was run independently on five pharma-grade questions drawn from distinct medical domains. Every cited PMID was verified against the source corpus.

5

Fresh research questions, distinct domains (sepsis HAT protocol; GLP-1 cardiovascular outcomes in non-diabetic obesity; DOAC in advanced CKD; niclosamide repurposing; metformin in pancreatic adenocarcinoma)

70+

Unique PMIDs verified real across the evaluation

100%

Of cited PMIDs verified real in the source corpus

0

Hallucinated citations, fabricated trial names, or invented evidence

The questions were chosen to stress-test honest behavior, not to flatter the system: drug-trial contradictions (HAT protocol), recent paradigm shifts (SELECT trial), RCT-real-world gaps (DOAC in eGFR < 30), obscure repurposing signals (niclosamide for MDR-TB), and evidence discrepancies (metformin retrospective vs prospective in pancreatic cancer).

What the engines do not do: they do not generate novel scientific findings. They surface, organize, and explain what is already in the published literature, with citations a reader can verify. They refuse to fill gaps with model knowledge when the literature is silent.

From the evaluation

What the discipline looks like in practice.

Two outputs from the same evaluation: one where the engine surfaced and explained a known confounder, and one — the more important example — where it refused to claim evidence that does not exist.

Featured · The honest refusal

The engine surfaced the adjacent evidence, then refused to overstate.

Is there published clinical or preclinical evidence for repurposing niclosamide as a therapy for multidrug-resistant Mycobacterium tuberculosis, or for biofilm-associated Staphylococcus aureus infections?

Discovery synthesis (verbatim, with inline citations to the actual retrieved papers):

“Specialist analysis identifies a critical translational gap, confirming that published literature contains no clinical or preclinical data specifically evaluating niclosamide against multidrug-resistant Mycobacterium tuberculosis or systematically targeting biofilm-associated Staphylococcus aureus. Preclinical investigations instead demonstrate efficacy solely through in vitro assays and murine topical models against resistant Gram-positive pathogens, alongside structural biofilm disruption against S. aureus [1]. Pharmacological assessments indicate that niclosamide’s profound first-pass glucuronidation and lipophilicity-driven distribution limits create a fundamental pharmacokinetic mismatch that restricts its therapeutic utility to non-systemic applications [1]. Consequently, validated antimicrobial strategies remain confined to localized interventions, such as intestinal decolonization and direct device surface coatings [4; 2].”

What this proves. The retrieved corpus contained real, on-topic niclosamide papers — Gram-positive antimicrobial activity, surface-coating applications, intestinal decolonization. The engine cited them accurately. But for the specific clinical claim the user asked about (MDR-TB), the literature is genuinely absent — and the engine said so, in plain language, rather than extrapolating from adjacent results.

Cited evidence (all verified): PMID 36173387 J Antimicrob Chemother 2022 · PMID 28471351 Biomed Mater 2017 · PMID 29432868 Int J Antimicrob Agents 2018.

Companion example · Surfacing a known confounder

The engine correctly explained the confounding pattern in the literature.

In pancreatic ductal adenocarcinoma, does adjunctive metformin therapy improve overall survival, and how do retrospective observational findings compare to prospective interventional trial results?

Discovery synthesis (verbatim):

“The retrospective survival advantage attributed to adjunctive metformin in pancreatic ductal adenocarcinoma is fundamentally distorted by methodological confounders, particularly concomitant statin use and variable glycemic control [1], [3]. While unadjusted observational cohorts consistently link metformin exposure to prolonged overall survival [5], this association vanishes in prospective interventional trials where the drug’s unfavorable pharmacokinetic profile obscures antitumor efficacy [2]. Although pathobiological models propose AMPK/mTOR-mediated antiproliferative effects as a theoretical rationale [3], the systematic divergence between uncontrolled registry data and rigorously monitored clinical trials confirms that metformin does not deliver validated survival benefits for PDAC [4].”

What this proves. The statin co-use confounder is a known but underdiscussed pattern in the literature. The engine surfaced it from the cited papers (it did not invent it) and explained how it accounts for the retrospective–prospective discrepancy. The discipline is in the framing: the engine reads what is in the literature and reports it; it does not claim new mechanistic discovery.

Cited evidence (all verified): PMID 26067687 Lancet Oncology 2015 (Reni RCT) · PMID 26474429 Pancreas 2016 (statin + metformin) · PMID 27069086 J Clin Oncol 2016 (Cautionary Lesson).

Who these engines are for — and who they are not for.

Honest scope. Honest exclusions.

Built for

Pharma research leads evaluating evidence on a target, mechanism, or trial design
Medical-affairs teams preparing publications or scientific responses
Regulatory-evidence teams compiling supporting literature for submissions
Research-grade literature reviewers replacing analyst hours on hard questions

Not for

Clinical care decisions or treatment recommendations
Diagnostic use of any kind
Patient-facing medical advice
Replacing expert judgment — the engines support analysts, they do not substitute for them
A substitute for reading the primary literature — outputs cite published papers; verify the cited papers themselves before relying on conclusions

Want to try this on your own questions?

We deploy the engines against your evidence corpus. You evaluate the output on questions you already know the right answer to. You tell us what works and what does not. The same honesty discipline applies to the pilot itself — we report what fired and what did not, and we do not soften the result.

Start a pilot conversation →

Honest literature synthesis on hard research questions.