Scholar Sidekick is built as deterministic citation infrastructure. These principles guide development and deployment.
Identical inputs produce identical outputs. CSL styles and locales are pinned to specific versions to prevent silent behavioral drift.
Public API behavior does not change without explicit versioning. Headers, error semantics, and export structures are treated as contract guarantees.
Marketplace integrations and MCP servers wrap the canonical HTTP API. No formatting or resolution logic is duplicated outside the core service.
Export formats conform to their published specifications. Outputs are validated through semantic tests rather than brittle snapshots.
Request identifiers, rate-limit headers, and health endpoints are consistently exposed to support operational transparency.
Requests are processed on demand. Raw citation inputs are not retained as application data after processing.
Determinism is not just “same input, same output” - it is also “same failure mode, same response.” Scholar Sidekick documents how each edge case behaves so integrators can rely on the contract.
Inputs that fail format validation return a 400 response with the JSON envelope { ok: false, code: BAD_REQUEST, error: <message> }. Validation runs at the route boundary; malformed inputs never reach a fetch adapter.
When an identifier passes validation but no upstream source returns a record, the response is 200 with items: [] (or a per-item error in batch mode), not a 404. Not-found results are cached briefly to avoid hammering the upstream on repeat lookups.
A small set of fields is guaranteed when an identifier resolves successfully: type, title, and at least one of (DOI, PMID, ISBN, id). All other fields (authors, journal, year, page range, abstract) are best-effort - they are populated when the upstream source supplies them and omitted otherwise. Missing fields are not synthesised.
For identifiers with multiple resolvers (DOI → Crossref then DataCite then doi.org; ISBN → Open Library then Google Books), the chain is consulted in fixed order and the first non-empty record wins. The full resolver chain per identifier type is published at /.well-known/sources.json. The chain itself is part of the contract: reordering or substituting resolvers is treated as a transform-version change.
Outbound fetches use AbortController-bounded timeouts and a small fixed retry budget for transient failures (network errors, 5xx). Persistent upstream failure produces an error envelope rather than partial data; cached records remain available throughout. All outbound hosts are allowlisted; arbitrary user-supplied URLs are rejected before any fetch occurs.
Rate limiting is sliding-window per plan tier (anonymous, free, pro, ultra, mega). Quota exhaustion returns 429 with a Retry-After header and standard RateLimit-* headers (IETF + legacy X-RateLimit-*). The contract envelope is the same as other errors.
Operational gates produce predictable status codes: 503 with code: MAINTENANCE when MAINTENANCE_MODE=1, and 405 on mutation routes when READ_ONLY_MODE=1. Health endpoints remain reachable in both modes.
transform_versionEvery API response includes the x-scholar-transform-version header. The value is a date-stamped tag that identifies the active normalisation, formatting, and resolver chain. It mirrors the transform_version field in /.well-known/sources.json.
For a given transform_version, identical inputs (identifier, style, output format, locale) produce byte-identical output. This is the machine-checkable form of the determinism principle stated above. Send the same DOI in Vancouver style today and in six months, and as long as the response carries the same x-scholar-transform-version, the bytes will match.
The constant is bumped when any of the following change in a way that could alter byte-identical output for the same input:
Cosmetic, infrastructure, or test-only changes do not bump the version. Bug fixes that correct previously-incorrect output do bump it; the integrity of the version contract requires that any change which alters output is observable.
Pin the x-scholar-transform-version in your tests or pipelines. If a future response carries a different value, treat it as a signal to re-baseline expected output rather than a regression. Two requests carrying the same value should produce identical bodies for identical inputs; if they do not, file an issue - that is a contract violation.
/verification is a copy-paste curl kit that lets you (or any external evaluator) verify these claims against the live API in under a minute.
Reproducibility is reinforced by per-request provenance headers: x-request-id, x-scholar-cache, x-scholar-formatter, x-scholar-style-used, x-scholar-transform-version, plus conditional CSL headers (x-csl-warning, x-csl-alias, x-csl-dependent, x-csl-fetch-style-id) when relevant. Together they let an integrator reconstruct exactly which code path produced a response.
These principles - determinism, contract stability, observable infrastructure - underwrite the citation-integrity surface Scholar Sidekick exposes. The peer-reviewed work that motivates that surface is Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. Fabricated citations: an audit across 2·5 million biomedical papers. The Lancet. 2026;407(10541):1779–1781 (doi:10.1016/S0140-6736(26)00603-3). The CITADEL pipeline in that paper is the methodological anchor; the determinism, source-provenance, and edge-case-handling principles on this page are what make a real-time, API-shaped analogue auditable. See /citation-integrity for the explainer and /tools/citation-verifier for the working implementation.