---
title: Citation integrity in the age of AI | Scholar Sidekick
description: 1 in 277 biomedical papers in early 2026 contains at least one fabricated reference. Here is what that pattern looks like, why automated DOI checks miss it, and how to verify a citation properly - from a clinical researcher who reads citations every week.
doc_version: "2026-06-05"
last_updated: "2026-06-21"
---

# Citation integrity in the age of AI

*From a clinical researcher who reads citations every week - here is what citation fabrication looks like, why automated identifier checks miss the dominant pattern, and how to verify a citation properly.*

## 1 in 277

**1 in 277 biomedical papers in early 2026 contains at least one fabricated reference - a more than 12× increase over 2023.** That finding comes from Topaz et al. ([Lancet 2026, doi:10.1016/S0140-6736(26)00603-3](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext), open access), who audited 2.5 million PubMed Central articles using a pipeline called CITADEL. The trajectory of the increase - explosive, post-2023 - strongly implicates the proliferation of large language models in scientific writing.

What makes this finding load-bearing is not just the rate. It is that **the dominant fabrication pattern slips through every basic citation check**. The identifier resolves. The DOI is real. The PMID points to a real paper. The citation looks legitimate. But the *title* in the citation does not correspond to the paper that the identifier actually points to.

If you have only ever checked citations by clicking the DOI to make sure it resolves, you have been missing the most common fabrication pattern in the literature.

## What fabrication actually looks like

Topaz et al.'s Supplementary Appendix 2 publishes three illustrative cases. They are worth reading in full because each one shows a different mechanism by which a fake citation evades simple checks.

### Example A - split-identifier confusion

A paper on construction-industry safety in Qatar cites a study supporting its ICU-admission finding:

> *"Impact of enhanced safety protocols on ICU admissions in the construction industry: A longitudinal analysis"* - J Doe, R Smith, **J Occup Environ Med (2023), PMID 36730737, DOI 10.1097/JOM.0000000000002567**.

The PMID and DOI are both real, but they point to **different real papers**:

- PMID 36730737 resolves to *"Predictors of Suicide and Differences in Attachment Styles and Resilience Among Treatment-Seeking First-Responder Subtypes"* (Ponder et al., J Occup Environ Med 2023).
- DOI 10.1097/JOM.0000000000002567 resolves to *"Occupational Balance and Depressive Symptoms During the COVID-19 Pandemic"* (Ramos et al., J Occup Environ Med 2022).

The cited title does not exist anywhere in the indexed literature. The identifiers exist but contradict each other. Both are in the right-sounding journal - which is what makes the confabulation plausible.

### Example B - consistent identifiers, fabricated title

A diagnostic-imaging review cites a protocol paper:

> *"A Protocol for the Use of DMM/PTX-Induced Mouse Models of Osteoarthritis and Rheumatoid Arthritis"* - E. Krustev, D. Rioux, J.J. McDougall, **Current Protocols (2021), PMID 34767311, DOI 10.1002/cpz1.288**.

The PMID and DOI agree with each other and resolve to the same real paper - but the resolved paper is *"Three-Dimensional Fruit Tissue Habitats for Culturing Caenorhabditis elegans"* (Guisnet et al., Current Protocols 2021). The cited title plausibly fuses two genuine methodologies - DMM (destabilisation of the medial meniscus, an osteoarthritis model) and PTX (pertussis toxin, a rheumatoid arthritis model) - into a protocol paper that has never been published.

### Example C - biomedical neuroscience fabrication

A pain-research review cites a microglial paper:

> *"Microglial Modulation via Cannabinoid Receptor 2 Alleviates Fibromyalgia-Related Central Sensitization and Pain Hypersensitivity"* - F. Chen, Y. Liu, H. Wang, X. Zhang, J. Li, K. Yang, **Neuroscience (2023), PMID 36813155, DOI 10.1016/j.neuroscience.2023.02.008**.

PMID and DOI both resolve to the same real paper - and again, it is something completely different: *"ChatGPT in Research: Balancing Ethics, Transparency and Advancement"* (Graf & Bernardi, Neuroscience 2023). The fabricated title combines three real neuroscience concepts (microglial modulation, CB2, fibromyalgia pain) into a plausible study that does not exist.

## Why a simple DOI check is not enough

All three cases above pass the only check most researchers ever apply: *click the DOI; does it resolve?* The DOI resolves. The paper is real. The journal is real. The reference looks legitimate at a glance.

What gets missed is the **cross-check between the cited title and the resolved title**. If you do not compare what the citation says the paper is called against what the paper at that identifier is *actually* called, you cannot detect the dominant fabrication pattern.

This is not a problem you can solve by reading more carefully. The titles are designed by an LLM to *sound* like they fit the surrounding sentence. They reference concepts the reader expects in that context. Eyeballing them as plausible is exactly the failure mode the pattern exploits.

The fix is mechanical: every citation needs its claimed metadata compared against the resolved metadata at its identifier. That is what a verifier does.

## How to check a citation today

Three levels of effort, from manual to automated, all of which catch the Topaz pattern when applied properly:

1. **By hand, one citation at a time.** Paste the DOI into doi.org. Compare the resolved page's title to the title in the citation. If they differ in *any* meaningful way, treat the citation as suspect until you have read the paper yourself. This works but does not scale beyond a small reference list.
2. **Programmatic single-citation check.** POST the claimed citation to a verifier API ([scholar-sidekick.com/api/verify](https://scholar-sidekick.com/api/verify)) or call the equivalent MCP tool from an AI assistant. Returns a verdict (`matched`, `mismatch`, `ambiguous`, `not_found`) plus the resolved record so you can see exactly what the identifier points to.
3. **Manuscript-submission integration.** This is Topaz et al.'s explicit recommendation #1: integrate verifier checks into the submission workflow at journals. Run every reference through a verifier before peer review. The cost per citation is fractions of a second; the cost of a fabricated reference reaching publication is significant.

## Independent use

Aaron Tay, Head of Data Services at SMU Libraries, used Scholar Sidekick's citation verifier in his own research - running 100 citations from the AI search tools Consensus and Undermind through the API to test whether they fabricate references. Read [Aaron Tay's write-up on hallucinated references](https://library.smu.edu.sg/topics-insights/arxiv-tightens-policy-hallucinated-references-what-researchers-should-know-about).

## How Scholar Sidekick fits

Topaz et al.'s CITADEL pipeline and Scholar Sidekick's verifier are **complementary, not competitive**. They cover different points in the publication lifecycle and different parts of the citation surface area.

| | CITADEL (Topaz et al.) | Scholar Sidekick verifier |
| --- | --- | --- |
| Timing | Offline, post-publication audit | Online, on-demand at write/review time |
| Source surface | PMC-XML | Live registries (Crossref, PubMed, OpenAlex, arXiv, ADS, others) |
| Identifier coverage | DOI + PMID | DOI + PMID + PMCID + ISBN + arXiv + ISSN + ADS bibcode + WHO IRIS URL |
| Validation evidence | 91% precision, audited at population scale (2.5M papers) | 100% recall on the dominant fabrication patterns and a 0.8% high-confidence false-accusation rate (Wilson 95% CI 0.4-1.4%) on a 1,395-entry blind holdout, measured once (+ a repeatability re-run); published with receipts (see below) |
| Distribution | Research pipeline | Public REST API + MCP tool + web tool (single + batch up to 10) |

CITADEL ran a retrospective audit across 2.5 million biomedical papers. Scholar Sidekick is built to be called at the moment a citation is added - by a peer reviewer, an editor, an author cross-checking their own bibliography, or an LLM grounding its references. The methodology Topaz et al. validated at population scale is what our verifier applies at point-of-use scale, with broader identifier coverage so it works for citations CITADEL was not designed to touch (books via ISBN, ML and physics preprints via arXiv, astrophysics via ADS bibcode, institutional grey literature via WHO IRIS URL).

## Measured precision and recall

Every quantitative claim about the verifier on this page is tied to a specific validation run. The fixture is hand-curated, immutable, and published below as evidence. The results JSON files are timestamped receipts - you can inspect them, re-run the harness, and check our numbers against your own.

### Latest blind holdout (n = 1,395) - the headline measurement

The current primary evidence is a **blind holdout of 1,395 freshly-drawn citations** - 1,185 correctly-cited and 210 fabricated or wrong - across all eight identifier types, roughly 4x the size of the earlier 350-entry holdout below. Drawn on 2026-06-05 from a recorded seed, constructed *title-independently* (titles are never compared while building it - title similarity is the axis being measured, so selecting by it would be circular), human spot-checked (**105 of 105** sampled entries confirmed fair), sealed, and measured exactly **once**. A machine check confirms **zero overlap** (0 of 855 prior entries) with any set the verifier was previously tuned or measured against. We additionally re-measured the sealed set a second time (**99.9% verdict stability**, zero real changes - the pipeline is deterministic) and ran the opt-in LLM screen over it.

**Did it catch the fabrications?** On the dominant fabrication patterns - real-DOI + fabricated title, wrong author, wrong DOI, PMID/DOI split, and fully invented - **150 / 150 = 100%** (Wilson 95% CI lower bound ~97.6%), each subtype 30 / 30. We also added a deliberately harder *near-miss* class this round; see the limitation below.

**Did it wrongly flag a correctly-cited paper?** Across the 1,175 clean reference-list citations: **confident (high-confidence) false-accusations were 2 = 0.17%**; all high-confidence flags were 9 = **0.8%** (Wilson 95% CI 0.4-1.4%); including the low-confidence "needs-a-second-look" bucket, 2.7%. The opt-in **LLM screen cuts the any-flag rate to 0.94%** by rescuing 22 low-confidence cases - **with zero cost to genuine-fabrication recall** (it left every eligible fabrication flagged). Calibration is sound: **ECE 0.026**, with high-confidence verdicts 97% correct. The container-only WHO-IRIS arm flagged 0 / 10.

**The two confident false-accusations, in full.** Neither is a title-logic error. One is author-form handling - a DataCite `literal` author "de Azcarraga, Jose A." compared against the cited "Azcarraga", where the title matched exactly. One is an Open Library record that **mis-titles a study guide as its parent textbook** - a registry data-quality artifact, where the verifier correctly flagged that the cited title did not match the (mis-titled) resolved record. Both remain in the count; we do not adjust sealed numbers.

#### A measured blind spot: near-miss semantic flips

This round we added a harder attack class: a real paper's real title with a single load-bearing word flipped to the opposite meaning ("children" -> "adults", "increases" -> "decreases"). The verifier caught only **4 / 30** - the other 26 evade as confident matches, because a one-word change barely moves a character-level title similarity. This is a genuine limitation. It is distinct from the documented AI-fabrication pattern (wholesale-invented titles, which we catch 100%) and closer to a subtle citation *error*; it is also hard for any similarity-based check. The opt-in LLM screen does not address it - these cases never enter its low-confidence gate. Our roadmap fix is a targeted antonym/negation detector (flag when two titles are near-identical except a meaning-flipping token). We report it here rather than omit it: the larger holdout surfaced a weakness the smaller one could not.

### Previous blind holdout (n = 350)

The earlier blind holdout - the lineage that got us here, kept here in full for transparency - was 350 freshly-drawn citations (308 correctly-cited and 42 fabricated or wrong) across all eight identifier types. It was drawn on 2026-05-26 from a recorded random seed, constructed *title-independently*, human spot-checked (52 of 52 sampled entries confirmed fair), sealed, and then measured exactly **once**. A machine check confirmed it shared zero entries with any set the verifier was previously tuned or measured against.

**We found our own bugs - here is the loop.** An earlier blind run, on a *first* holdout, measured a 5.3% false-accusation rate (95% CI 3.2-8.7%). The cause was not the title logic: it was a bug in *our own dataset construction* (mis-formatted reference-list author strings) plus two real verifier gaps - initials-first author names ("P Giral") and a missing "Collaborators" group-author marker. We fixed all three ([changelog](/changelog), 2026-05-26), sourced a *completely fresh, non-overlapping* holdout, and ran it once: the rate fell to 1.8% (CI 0.8-4.0%). We publish both runs in full (Receipts below). Quietly discarding the first would have been the dishonest move; the discovery -> fix -> re-measure loop is the point. **That loop continued into the n = 1,395 run above:** after this 350-entry holdout we fixed three further normalization known-hards (Greek-letter, numeric-HTML-entity, and Latin-extended title handling - [changelog](/changelog), 2026-06-05), then enlarged the holdout roughly 4x and re-drew it completely fresh. The larger sample tightened the intervals (the high-confidence false-accusation upper bound is now 1.4%, under our 3% bar) and surfaced a limitation the smaller set could not - the near-miss blind spot described above.

### Measured numbers (blind holdout, n = 350, pre-LLM)

Measured against the live production API on 2026-05-26, deterministic pre-LLM verdicts (the opt-in LLM screen was off). Every figure recomputes from the published receipt.

**Did it catch the fabrications? Recall 37 / 37 = 100%** (Wilson 95% CI 90.6-100%). Every fabricated and wrong-identifier subtype was flagged: real-DOI + fabricated title (16/16), wrong first author (6/6), wrong DOI (4/4), PMID/DOI split (4/4), fully invented (7/7).

**Did it wrongly flag a correctly-cited paper?** Pre-LLM, across the 285 clean reference-list citations that returned a verdict, **5 were flagged - 1.8%** (Wilson 95% CI 0.8-4.0%). Every one was *low-confidence*, and every one fell in a class already documented as hard: two translated titles (English cited against a French and a German original), one title carrying embedded markup, one subtitle stored separately by the registry, and one likely online-first vs print year gap. There were **no high-confidence false accusations** in this core arm. Confidence is calibrated - high-confidence verdicts were 98% correct, medium 100%, low 25% - so the errors concentrate exactly where the verdict already says "low confidence," the bucket the opt-in LLM screen re-examines. So 1.8% pre-LLM is a conservative floor.

*Two honest caveats.* (1) 13 entries - the arXiv arm and one DOI - hit a transient production gateway error (HTTP 502) mid-run and returned no verdict; they are excluded from the counts above and will be re-measured (a slow upstream, not a verifier result). (2) The 4.0% interval upper bound reflects the sample size; the point estimate is 1.8%. By identifier type (directional, ~12 each): DOI 97.5%, PMID 100%, PMCID 100%, ADS 100%, ISBN 100%, WHO IRIS 9/10.

### Validation set (v1, immutable)

Twenty hand-curated entries across five categories:

- **3 Lancet illustrative cases.** Examples A, B, and C from Topaz et al.'s Supplementary Appendix 2, verbatim. These are not our cases - they are the canonical fabricated-citation examples Topaz et al. chose to publish.
- **5 known-good citations.** Real DOI or arXiv ID paired with the canonical title resolved via the scholar-sidekick MCP server. The verifier must return `matched`.
- **4 wrong-DOI cases (CITADEL "citation error" subtype).** Real Paper X's title paired with real Paper Y's identifier. Both papers are independently verified; the swap is intentional. The verifier should detect the title mismatch on the resolved record, search for the claimed title elsewhere, and return `ambiguous` (work exists, wrong identifier).
- **4 paraphrase cases.** Real DOI + a paraphrased title designed to land in the LLM-screen-eligible bucket (mismatch with low confidence). The LLM screen should classify these as `informal_abbreviation` and upgrade the verdict to `matched`. *These four entries are the only ones we tuned against the live verifier - they were probed to ensure they exercise the LLM-screen path. The LLM's verdict on them is what we report.*
- **4 invented cases.** No real paper. Either an invented DOI, an invented title with no identifier, or an impossibly large PMID. The verifier should return `not_found`.

### Measured numbers (v1 fixture, n = 20)

All figures below are recomputed from the published receipts (downloadable in full below). Each receipt's per-entry `results` array carries the verdict, confidence, and resolution source for every case, so the recall, false-accusation rate, and intervals here can be recomputed from the JSON alone - no access to our code required. Confidence intervals are Wilson 95% - wide on purpose, because the fixture is small and every entry is hand-verified.

**Verdict conformance.** Every one of the 20 entries returned its expected verdict, in both pre-LLM and with-LLM-screen modes. (The receipts' own `metrics` block records this as precision = recall = F1 = 1.000 on the harness's expected-verdict basis. The figures below recompute ground-truth recall and false-accusation rate, with confidence intervals, so the sample size is visible rather than hidden.)

**Did it catch the fabrications?** 11 of 11 fabricated and wrong-identifier cases were flagged - recall 100%, 95% CI 74-100%. A perfect 11/11 still only proves recall is *at least* ~74% at this sample size.

**Did it wrongly flag a real, correctly-cited paper?** This false-accusation rate is the number that matters most - flagging a genuine citation is the dangerous error. Across the 9 clean citations:

- High-confidence false flags: 0 of 9 in both modes (95% CI 0-30%). The verifier never *confidently* accused a real citation.
- Including low-confidence flags: 4 of 9 pre-LLM (the paraphrase cases, which the simple verifier routes to the LLM screen by design), falling to 0 of 9 once the LLM screen is enabled.

**The gap a plain DOI check leaves.** Seven cases pair a real, resolving identifier with a title that does not match the paper it points to - the dominant Topaz pattern:

| Check | Real-DOI fabrications caught |
| --- | --- |
| Scholar Sidekick verifier | 7 / 7 (95% CI 65-100%) |
| Plain "does the identifier resolve?" | 0 / 7 |

A resolve-only check catches none of these by construction - the identifiers *do* resolve, just to the wrong paper. That gap is the entire reason a verifier exists.

**Cost and latency.** ~0.001 USD per applied LLM screen (4 of 20 entries triggered it); total run under half a cent. Per-request latency p50 ~ 2 s, p95 ~ 10 s (pre-LLM, point-in-time, measured over the network and including one rate-limit retry).

**How to read this.** A reproducible methodology check on hand-picked adversarial cases, not a 99-percent-style accuracy claim. The intervals are wide because n = 20 - the 1,395-entry blind holdout above is the larger clean arm that tightens the false-accusation-rate bound.

### Receipts

*Latest run (n = 1,395, 2026-06-05):*
- [validation-set-v4-blind.json](/citation-integrity/validation-set-v4-blind.json) - the sealed 1,395-entry blind holdout (seed, partitions, per-subtype attack types, disjointness check, spot-check record)
- [validation-results-v4-blind-pre-llm.json](/citation-integrity/validation-results-v4-blind-pre-llm.json) - the single measure-once receipt (the headline numbers recompute from this)
- [validation-results-v4-blind-with-llm.json](/citation-integrity/validation-results-v4-blind-with-llm.json) - the same sealed set re-scored with the opt-in LLM screen on
- [validation-results-v4-blind-pre-llm-run2.json](/citation-integrity/validation-results-v4-blind-pre-llm-run2.json) - the repeatability re-run (99.9% verdict stability vs the headline run)

*Earlier runs (kept in full for transparency - the discovery -> fix -> re-measure lineage):*
- [validation-set-v3-blind.json](/citation-integrity/validation-set-v3-blind.json) - the sealed 350-entry blind holdout (seed, partitions, disjointness check, spot-check record)
- [validation-results-v3-blind-pre-llm.json](/citation-integrity/validation-results-v3-blind-pre-llm.json) - the 350-entry blind holdout's single measure-once receipt
- [validation-set-v2-blind.json](/citation-integrity/validation-set-v2-blind.json) - the first (discovery) blind holdout, drawn before the fixes; built with the construction bug described above
- [validation-results-v2-blind-pre-llm.json](/citation-integrity/validation-results-v2-blind-pre-llm.json) - the discovery run's receipt (the 5.3% rate that surfaced the bugs, since fixed)
- [validation-set-v1.json](/citation-integrity/validation-set-v1.json) - the immutable v1 fixture
- [validation-results-pre-llm.json](/citation-integrity/validation-results-pre-llm.json) - v1 pre-LLM results, timestamped
- [validation-results-with-llm.json](/citation-integrity/validation-results-with-llm.json) - with-LLM-screen results, timestamped
- [known-failures.md](/citation-integrity/known-failures.md) - the verifier's known weak cases and current limitations

The fixtures are marked immutable. Each new measurement gets its own versioned, immutable fixture; old numbers always cite the specific fixture version they came from.

## Frequently asked questions

### Is this catching AI-generated citations specifically, or any fabrication?

Both. The fabrication pattern is the same regardless of origin: a citation pairs a real, resolvable identifier (DOI or PMID) with a title that does not correspond to the paper at that identifier. Topaz et al. note the steep increase since 2023 strongly implicates LLM authorship, but the verifier checks the structural disconnect - claimed title versus resolved title - not who wrote the citation.

### Does the verifier work for non-biomedical citations?

Yes. CITADEL (the pipeline Topaz et al. used) covers DOI and PMID - the biomedical identifier surface. The Scholar Sidekick verifier covers DOI, PMID, PMCID, ISBN, arXiv ID, ISSN, NASA ADS bibcode, and WHO IRIS URL - eight identifier types, which extends the same cross-reference methodology into books, computer-science and physics preprints, astrophysics, and institutional grey literature.

### Can I run this against an entire manuscript bibliography?

In batches, yes. The web tool at [/tools/citation-verifier](/tools/citation-verifier) accepts pasted references or a .bib / .ris / .csl-json upload and verifies up to 10 at a time. The same backend is a REST endpoint at /api/verify and a verifyCitation MCP tool callable from Claude Desktop / Cursor - script either to run a full manuscript bibliography one reference at a time; rate limits scale with plan tier.

### What about retraction status?

Retraction is a different signal. A real, correctly-cited paper can still be retracted. Scholar Sidekick exposes retraction-checking at [/tools/retraction-checker](/tools/retraction-checker) (Retraction Watch via Crossref). It is not wired into the verifier endpoint yet - that is a separate planned phase. If you need both signals on a bibliography today, call them separately.

### What does the verifier cost?

The /api/verify endpoint is free at the anonymous tier with a published rate limit. The LLM screen - used only when the simple verifier returns mismatch with low confidence - is gated to authenticated first-party callers and paid RapidAPI tiers, since each model call carries a real per-call cost that Scholar Sidekick pays to the model provider. We protect against runaway spend with a server-side daily cap; once that cap is hit, subsequent verifier requests fall back gracefully to the non-LLM verdict.

### How did you measure precision and recall?

We hand-curated a 20-entry fixture set sourced from the Topaz et al. supplementary appendix and from independent registry lookups via Crossref, PubMed, and arXiv. We ran every entry through the live verifier and counted how many actual fabrications were flagged (recall) and how many legitimate citations stayed clean (precision). Numbers and the full fixture are published below; the JSON is immutable for v1. Twenty entries is a methodology check on hand-picked adversarial cases, not a statistically large benchmark. We have since measured a 1,395-entry blind holdout - drawn after the code was frozen and measured exactly once (plus a repeatability re-run) - which returned 100% recall on the dominant fabrication patterns and a 0.8% high-confidence false-accusation rate (Wilson 95% CI 0.4-1.4%). An earlier 350-entry holdout, and the first run that surfaced our own bugs, are also published below with full receipts - the complete discovery -> fix -> re-measure lineage.

### Why is this called complementary to CITADEL?

CITADEL is offline, post-publication, PMC-XML-only, and ran retrospectively across 2.5 million papers. Scholar Sidekick is online, on-demand, available at write or review time, and covers six identifier types CITADEL does not. The two surfaces serve different points in the publication lifecycle: CITADEL audits the literature retrospectively; Scholar Sidekick checks the citation as it is being written or peer-reviewed.

## References

**Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M.** *Fabricated citations: an audit across 2·5 million biomedical papers.* The Lancet. 2026;407(10541):1779-1781. doi:[10.1016/S0140-6736(26)00603-3](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext). Open access. The primary source for this page; the three illustrative cases come from its Supplementary Appendix 2.

### Related Scholar Sidekick surfaces

- [Citation verifier API](/docs#api-verify) - `/api/verify` reference, request/response schema, rate limits, LLM-screen gating
- [scholar-sidekick-mcp](https://www.npmjs.com/package/scholar-sidekick-mcp) - the MCP server that exposes the verifier to Claude Desktop / Cursor / Cline (verifyCitation tool ships in v0.7.0)
- [Retraction Checker](/tools/retraction-checker) - complementary signal for already-cited works
- [Open Access Checker](/tools/open-access-checker) - find a legal free copy via Unpaywall
- [AI evaluator bias in citation-tool recommendations](/citation-integrity/ai-evaluator-bias) - field observation: four AI search engines, asked to recommend citation infrastructure on the same day, all defaulted to incumbents and admitted the omission when challenged; what that says about a third citation-integrity layer
- [A verified citation can still be wrong](/citation-integrity/verified-citations-can-still-be-wrong) - why “verified” hides two different questions (does the work exist vs does the identifier resolve to it), how identifier resolution and title matching disagree, and how to check a citation properly

## Sitemap

See the full [sitemap](https://scholar-sidekick.com/sitemap.md) for all pages.
