Add <crossmark> element to Crossref XML output#1121
Conversation
- Add RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPE mapping dict - Add CROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE mapping dict - Implement xml_crossref_crossmark_pipe() function - Add crossmark pipe to pipeline_crossref() - Add 12 tests covering all crossmark update types and edge cases" Co-authored-by: robertatakenaka <[email protected]>
<crossmark> element to Crossref XML output
There was a problem hiding this comment.
Pull request overview
This PR adds Crossref Crossmark support to the XML pipeline by implementing xml_crossref_crossmark_pipe() in crossref.py. It generates <crossmark> elements with update metadata derived from JATS <related-article> elements and <history> dates, mapping JATS article relationship types to Crossref update types (e.g., correction-forward → correction, retracted-article → retraction).
Changes:
- Adds two mapping dictionaries (
RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPEandCROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE) to map JATS related-article types to Crossref update types and their corresponding history date types. - Implements
xml_crossref_crossmark_pipe()function that builds<crossmark>elements with version, policy, and update entries (including DOI and date), appending them to each<journal_article>in the Crossref output. - Adds comprehensive tests covering various related-article types, multiple updates, fallback date logic, custom policy, unmappable types, and multi-article application.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
packtools/sps/formats/crossref.py |
Adds mapping dicts and xml_crossref_crossmark_pipe() function; integrates it into pipeline_crossref() |
tests/sps/formats/test_crossref.py |
Adds CrossmarkPipeTest class with 11 test cases covering the new crossmark functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| xml_crossref_doi_pipe(xml_crossref, xml_tree) | ||
| xml_crossref_resource_pipe(xml_crossref, xml_tree) | ||
| xml_crossref_collection_pipe(xml_crossref, xml_tree) | ||
| xml_crossref_crossmark_pipe(xml_crossref, xml_tree, data) |
There was a problem hiding this comment.
The crossmark element is appended to journal_article after doi_data (since the pipe is called at line 559, after xml_crossref_collection_pipe which populates doi_data). However, according to the Crossref 4.4.0 schema (xs:sequence in ct:journal_article), crossmark must appear before fr:program, ai:program, rel:program, and doi_data.
The current element order in the output will be: ...publisher_item, ai:program, rel:program, doi_data, crossmark, citation_list, but the schema requires: ...publisher_item, crossmark, ..., ai:program, rel:program, ..., doi_data, citation_list.
This call should be moved earlier in the pipeline, before xml_crossref_permissions_pipe (line 553), to produce schema-valid XML. For example, placing it right after xml_crossref_elocation_pipe would put crossmark in the correct position between publisher_item and ai:program.
| <journal_article language="pt" publication_type="research-article" reference_distribution_opts="any"> | ||
| <doi_data>...</doi_data> | ||
| <crossmark> | ||
| <crossmark_version>1</crossmark_version> | ||
| <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy> | ||
| <updates> | ||
| <update type="correction"> | ||
| <doi>10.1590/erratum-doi</doi> | ||
| <date media_type="online"> | ||
| <month>07</month> | ||
| <year>2025</year> | ||
| </date> | ||
| </update> | ||
| </updates> | ||
| </crossmark> | ||
| </journal_article> | ||
| </journal> | ||
| </body> | ||
| </doi_batch> |
There was a problem hiding this comment.
The docstring example shows <crossmark> as a sibling after <doi_data>, but according to the Crossref 4.4.0 schema (xs:sequence in journal_article), crossmark must appear before doi_data. When the pipeline ordering is fixed (see comment on pipeline_crossref), this docstring example should also be updated to show <crossmark> before <doi_data>.
| """ | ||
| if data is None: | ||
| data = {} | ||
| crossmark_policy = data.get( |
There was a problem hiding this comment.
@copilot na ausência do crossmark_policy, execute return None, saindo da função
There was a problem hiding this comment.
Done in 5cd6f4d. crossmark_policy is now required — if absent from the data dict, the function returns immediately without generating any <crossmark> element. A new test test_crossmark_pipe_no_policy_returns_early covers this behavior.
| if data is None: | ||
| data = {} | ||
| crossmark_policy = data.get( | ||
| "crossmark_policy", "https://www.scielo.br/crossmark-policy" |
There was a problem hiding this comment.
Done in b7224a9. The data parameter no longer has a default value — the signature is now xml_crossref_crossmark_pipe(xml_crossref, xml_tree, data). No default values are adopted anywhere in the function.
- Remove default value for crossmark_policy - Add early return when crossmark_policy is not provided - Update tests to pass crossmark_policy explicitly - Add test_crossmark_pipe_no_policy_returns_early to cover new behavior Co-authored-by: robertatakenaka <[email protected]>
- Change signature from data=None to data (required positional arg)
- Replace `if data is None: data = {}` guard with single `data.get(...) if data else None`
- Update test to pass None explicitly instead of omitting the argument
Co-authored-by: robertatakenaka <[email protected]>
Adds Crossref Crossmark support to the XML pipeline, generating
<crossmark>elements from<related-article>and<history>data in SciELO/JATS XML.O que esse PR faz?
Implements
xml_crossref_crossmark_pipe()incrossref.pyto generate Crossref Crossmark metadata for articles with related updates (corrections, retractions, addenda, etc.).Mapping layer — two new dicts:
RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPE: maps 11 JATSrelated-article-typevalues to the 7 Crossref update types (correction,retraction,partial_retraction,expression_of_concern,addendum,withdrawal,new_version)CROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE: resolves update dates from<history>date nodes (e.g.corrected→correction); falls back to article pub dateGenerated output for an article with
correction-forwardand a matchingcorrectedhistory date:crossmark_policyis a required, no-default parameter supplied viadata["crossmark_policy"]. Thedataargument itself has no default value — callers must always pass it explicitly. Ifcrossmark_policyis absent or empty, the function returns immediately without generating any<crossmark>element. The element is also skipped entirely when no mappable<related-article>elements are present.Onde a revisão poderia começar?
packtools/sps/formats/crossref.py— start at the two new mapping dicts near the top, thenxml_crossref_crossmark_pipe()(near the bottom), then the updatedpipeline_crossref()call order.Como este poderia ser testado manualmente?
Run against any SciELO XML that has
<related-article>with a type from the mapping (e.g.correction-forward,retracted-article):The fixture
tests/sps/fixtures/xml_test_fixtures/S1984-92302025000100304.xmlcontainscorrection-forwardand can be used directly.Algum cenário de contexto que queira dar?
The mapping follows the JATS → Crossref type correspondence from the issue. Types with no Crossref equivalent (
commentary-article,letter, etc.) are silently skipped — no crossmark is emitted. Ambiguous types (corrected-article,updated-article) are mapped to their most conservative Crossref equivalent (correction,new_version).crossmark_policyis intentionally required with no default value: each publisher must explicitly provide their registered Crossmark policy DOI or URL. Thedataparameter ofxml_crossref_crossmark_pipe()also carries no default — passingNoneor a dict withoutcrossmark_policycauses the pipe to exit early without writing any<crossmark>to the output.Screenshots
N/A — XML output only.
Quais são tickets relevantes?
Relacionado ao desenvolvimento da funcionalidade de Crossmark para o formato xmlcrossref do SciELO.
Fix #1118
Referências
related-article-typevalues (NLM/NISO)Original prompt
This section details on the original issue you should resolve
<issue_title>Criar a funcionalidade de adicionar
<crossmark/>ao XML do Crossref</issue_title><issue_description># Crossmark – 12 tipos de update (Crossref)
Criar para o formato do xmlcrossref, o elemento crossmark, considerando os campos de histórico (hist) e os campos de relacionamento (related-article) do XML SciELO / JATS.
Referência: https://www.crossref.org/documentation/crossmark/
1.
addendumInformação adicional relevante publicada após o artigo original.
2.
clarificationEsclarece ambiguidade ou trecho confuso sem alterar conclusões.
3.
correctionTermo genérico para correção de erro não classificado abaixo.
4.
corrigendumCorreção formal de erros introduzidos pelos autores.
5.
erratumCorreção de erros introduzidos pela editora (tipografia, etc.).
6.
expression_of_concernEditor expressa preocupação com integridade dos dados/metodologia.
7.
new_editionNova edição do trabalho (livro/monografia); substitui a anterior.
8.
new_versionNova versão do registro (preprint → VoR, dataset atualizado, etc.).
9.
partial_retractionRetratação de parte do artigo (seção, figura, experimento).