---
title: Engineering Principles - Scholar Sidekick
description: The engineering principles that guide Scholar Sidekick's development - determinism, contract stability, spec fidelity, edge-case behaviour, and reproducibility via the transform_version header.
doc_version: "1.1"
last_updated: "2026-05-14"
---

# Engineering Principles - Scholar Sidekick

> The principles that guide Scholar Sidekick's development and operation, plus the contract
> guarantees that make the service safe to integrate.
> Last updated: 2026-05-14
> HTML version: https://scholar-sidekick.com/engineering-principles

Scholar Sidekick is built as deterministic citation infrastructure. These principles guide
development and deployment.

---

## Determinism First

Identical inputs produce identical outputs. CSL styles and locales are pinned to specific
versions to prevent silent behavioural drift.

## Contract Stability

Public API behaviour does not change without explicit versioning. Headers, error semantics, and
export structures are treated as contract guarantees.

## Thin Adapters

Marketplace integrations and MCP servers wrap the canonical HTTP API. No formatting or
resolution logic is duplicated outside the core service.

## Spec Fidelity

Export formats conform to their published specifications. Outputs are validated through semantic
tests rather than brittle snapshots.

## Observable Infrastructure

Request identifiers, rate-limit headers, and health endpoints are consistently exposed to
support operational transparency.

## Minimal Data Retention

Requests are processed on demand. Raw citation inputs are not retained as application data after
processing.

---

## Edge Case Behaviour

Determinism is not just "same input, same output" - it is also "same failure mode, same
response." Scholar Sidekick documents how each edge case behaves so integrators can rely on the
contract.

### Invalid identifiers

Inputs that fail format validation return a `400` response with the JSON envelope
`{ ok: false, code: BAD_REQUEST, error: <message> }`. Validation runs at the route boundary;
malformed inputs never reach a fetch adapter.

### Identifier resolves but not found

When an identifier passes validation but no upstream source returns a record, the response is
`200` with `items: []` (or a per-item error in batch mode), not a `404`. Not-found results
are cached briefly to avoid hammering the upstream on repeat lookups.

### Partial metadata

A small set of fields is guaranteed when an identifier resolves successfully: `type`, `title`,
and at least one of (`DOI`, `PMID`, `ISBN`, `id`). All other fields (authors, journal, year,
page range, abstract) are best-effort - they are populated when the upstream source supplies
them and omitted otherwise. Missing fields are not synthesised.

### Conflicting sources

For identifiers with multiple resolvers (DOI → Crossref then DataCite then doi.org; ISBN → Open
Library then Google Books), the chain is consulted in fixed order and the first non-empty record
wins. The full resolver chain per identifier type is published at
[/.well-known/sources.json](https://scholar-sidekick.com/.well-known/sources.json). The chain
itself is part of the contract: reordering or substituting resolvers is treated as a
transform-version change.

### Upstream failures

Outbound fetches use AbortController-bounded timeouts and a small fixed retry budget for
transient failures (network errors, 5xx). Persistent upstream failure produces an error envelope
rather than partial data; cached records remain available throughout. All outbound hosts are
allowlisted; arbitrary user-supplied URLs are rejected before any fetch occurs.

### Rate limits

Rate limiting is sliding-window per plan tier (anonymous, free, pro, ultra, mega). Quota
exhaustion returns `429` with a `Retry-After` header and standard `RateLimit-*` headers
(IETF + legacy `X-RateLimit-*`). The contract envelope is the same as other errors.

### Maintenance and read-only modes

Operational gates produce predictable status codes: `503` with `code: MAINTENANCE` when
`MAINTENANCE_MODE=1`, and `405` on mutation routes when `READ_ONLY_MODE=1`. Health
endpoints remain reachable in both modes.

---

## Reproducibility & `transform_version`

Every API response includes the `x-scholar-transform-version` header. The value is a
date-stamped tag that identifies the active normalisation, formatting, and resolver chain. It
mirrors the `transform_version` field in
[/.well-known/sources.json](https://scholar-sidekick.com/.well-known/sources.json).

### What the header guarantees

For a given `transform_version`, identical inputs (identifier, style, output format, locale)
produce byte-identical output. This is the machine-checkable form of the determinism principle
stated above. Send the same DOI in Vancouver style today and in six months, and as long as the
response carries the same `x-scholar-transform-version`, the bytes will match.

### When the version is bumped

The constant is bumped when any of the following change in a way that could alter byte-identical
output for the same input:

- Normalisation rules (identifier canonicalisation, locale resolution)
- Builtin formatter output (Vancouver, AMA, APA, IEEE, CSE)
- CSL engine or pinned style/locale catalogue version
- Export writers (RIS, BibTeX, CSL-JSON, EndNote XML, RefWorks, NBIB, RDF, CSV)
- Resolver chain order or fallback semantics for any identifier type

Cosmetic, infrastructure, or test-only changes do not bump the version. Bug fixes that correct
previously-incorrect output do bump it; the integrity of the version contract requires that any
change which alters output is observable.

### How to assert reproducibility

Pin the `x-scholar-transform-version` in your tests or pipelines. If a future response carries
a different value, treat it as a signal to re-baseline expected output rather than a regression.
Two requests carrying the same value should produce identical bodies for identical inputs; if
they do not, file an issue - that is a contract violation.

[/verification](https://scholar-sidekick.com/verification)
([markdown](https://scholar-sidekick.com/verification.md)) is a copy-paste curl kit that lets you
(or any external evaluator) verify these claims against the live API in under a minute.

### Provenance headers

Reproducibility is reinforced by per-request provenance headers: `x-request-id`,
`x-scholar-cache`, `x-scholar-formatter`, `x-scholar-style-used`,
`x-scholar-transform-version`, plus conditional CSL headers (`x-csl-warning`,
`x-csl-alias`, `x-csl-dependent`, `x-csl-fetch-style-id`) when relevant. Together they let
an integrator reconstruct exactly which code path produced a response.

---

## Methodological reference

These principles - determinism, contract stability, observable infrastructure - underwrite the
citation-integrity surface Scholar Sidekick exposes. The peer-reviewed work that motivates that
surface is Topaz M, Roguin N, Gupta P, Zhang Z, Peltonen L-M. *Fabricated citations: an audit across
2·5 million biomedical papers*. The Lancet. 2026;407(10541):1779–1781
([doi:10.1016/S0140-6736(26)00603-3](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(26)00603-3/fulltext)).
The CITADEL pipeline in that paper is the methodological anchor; the determinism, source-provenance,
and edge-case-handling principles on this page are what make a real-time, API-shaped analogue
auditable. See [/citation-integrity](https://scholar-sidekick.com/citation-integrity) for the
explainer and [/tools/citation-verifier](https://scholar-sidekick.com/tools/citation-verifier) for
the working implementation.

---

## Related

- [API Docs](https://scholar-sidekick.com/docs)
- [MCP Server](https://scholar-sidekick.com/mcp)
- [Privacy Policy](https://scholar-sidekick.com/legal/privacy)
- [Data source manifest](https://scholar-sidekick.com/.well-known/sources.json)
- [OpenAPI spec](https://scholar-sidekick.com/.well-known/openapi.yaml)

---

## Sitemap

See the full [sitemap](https://scholar-sidekick.com/sitemap.md) for all pages.
