Skip to content

Add <crossmark> element to Crossref XML output#1121

Merged
robertatakenaka merged 4 commits intomasterfrom
copilot/add-crossmark-to-xml
Mar 6, 2026
Merged

Add <crossmark> element to Crossref XML output#1121
robertatakenaka merged 4 commits intomasterfrom
copilot/add-crossmark-to-xml

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 6, 2026

Adds Crossref Crossmark support to the XML pipeline, generating <crossmark> elements from <related-article> and <history> data in SciELO/JATS XML.

O que esse PR faz?

Implements xml_crossref_crossmark_pipe() in crossref.py to generate Crossref Crossmark metadata for articles with related updates (corrections, retractions, addenda, etc.).

Mapping layer — two new dicts:

  • RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPE: maps 11 JATS related-article-type values to the 7 Crossref update types (correction, retraction, partial_retraction, expression_of_concern, addendum, withdrawal, new_version)
  • CROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE: resolves update dates from <history> date nodes (e.g. correctedcorrection); falls back to article pub date

Generated output for an article with correction-forward and a matching corrected history date:

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="correction">
      <doi>10.1590/1984-92302025v32n0000-ERRATA-EN</doi>
      <date media_type="online">
        <month>07</month>
        <year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

crossmark_policy is a required, no-default parameter supplied via data["crossmark_policy"]. The data argument itself has no default value — callers must always pass it explicitly. If crossmark_policy is absent or empty, the function returns immediately without generating any <crossmark> element. The element is also skipped entirely when no mappable <related-article> elements are present.

Onde a revisão poderia começar?

packtools/sps/formats/crossref.py — start at the two new mapping dicts near the top, then xml_crossref_crossmark_pipe() (near the bottom), then the updated pipeline_crossref() call order.

Como este poderia ser testado manualmente?

Run against any SciELO XML that has <related-article> with a type from the mapping (e.g. correction-forward, retracted-article):

from lxml import etree as ET
from packtools.sps.formats.crossref import pipeline_crossref

xml_tree = ET.parse("path/to/article_with_related_articles.xml")
data = {
    "depositor_name": "SciELO",
    "depositor_email_address": "[email protected]",
    "registrant": "SciELO",
    "crossmark_policy": "https://www.scielo.br/crossmark-policy",
}
output = pipeline_crossref(xml_tree, data)
# Inspect <crossmark> in output

The fixture tests/sps/fixtures/xml_test_fixtures/S1984-92302025000100304.xml contains correction-forward and can be used directly.

Algum cenário de contexto que queira dar?

The mapping follows the JATS → Crossref type correspondence from the issue. Types with no Crossref equivalent (commentary-article, letter, etc.) are silently skipped — no crossmark is emitted. Ambiguous types (corrected-article, updated-article) are mapped to their most conservative Crossref equivalent (correction, new_version).

crossmark_policy is intentionally required with no default value: each publisher must explicitly provide their registered Crossmark policy DOI or URL. The data parameter of xml_crossref_crossmark_pipe() also carries no default — passing None or a dict without crossmark_policy causes the pipe to exit early without writing any <crossmark> to the output.

Screenshots

N/A — XML output only.

Quais são tickets relevantes?

Relacionado ao desenvolvimento da funcionalidade de Crossmark para o formato xmlcrossref do SciELO.
Fix #1118

Referências

Original prompt

This section details on the original issue you should resolve

<issue_title>Criar a funcionalidade de adicionar <crossmark/> ao XML do Crossref</issue_title>
<issue_description># Crossmark – 12 tipos de update (Crossref)

Criar para o formato do xmlcrossref, o elemento crossmark, considerando os campos de histórico (hist) e os campos de relacionamento (related-article) do XML SciELO / JATS.

<doi_data>
  <doi>10.xxxx/xxxx</doi>
  <resource>http://sua-url.com/artigo</resource>
  <collection property="crawler-based">
    <item>
      <resource>http://sua-url.com/artigo.pdf</resource>
    </item>
  </collection>
  <!-- Início do Crossmark -->
  <crossmark>
    <crossmark_policy>10.xxxx/sua-politica-doi</crossmark_policy>
    <crossmark_domains>
      <crossmark_domain>
        <domain>sua-url.com</domain>
      </crossmark_domain>
    </crossmark_domains>
    <crossmark_domain_exclusive>true</crossmark_domain_exclusive>
    <!-- Opcional: metadados adicionais, como correções, podem ser incluídos aqui -->
  </crossmark>
</doi_data>

Referência: https://www.crossref.org/documentation/crossmark/


1. addendum

Informação adicional relevante publicada após o artigo original.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="addendum">
      <doi>10.1590/addendum-example-001</doi>
      <date media_type="online">
        <month>03</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

2. clarification

Esclarece ambiguidade ou trecho confuso sem alterar conclusões.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="clarification">
      <doi>10.1590/clarification-example-002</doi>
      <date media_type="online">
        <month>04</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

3. correction

Termo genérico para correção de erro não classificado abaixo.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="correction">
      <doi>10.1590/correction-example-003</doi>
      <date media_type="online">
        <month>05</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

4. corrigendum

Correção formal de erros introduzidos pelos autores.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="corrigendum">
      <doi>10.1590/corrigendum-example-004</doi>
      <date media_type="online">
        <month>06</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

5. erratum

Correção de erros introduzidos pela editora (tipografia, etc.).

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="erratum">
      <doi>10.1590/erratum-example-005</doi>
      <date media_type="online">
        <month>07</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

6. expression_of_concern

Editor expressa preocupação com integridade dos dados/metodologia.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="expression_of_concern">
      <doi>10.1590/eoc-example-006</doi>
      <date media_type="online">
        <month>08</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

7. new_edition

Nova edição do trabalho (livro/monografia); substitui a anterior.

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="new_edition">
      <doi>10.1590/new-edition-example-007</doi>
      <date media_type="online">
        <month>09</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

8. new_version

Nova versão do registro (preprint → VoR, dataset atualizado, etc.).

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
  <updates>
    <update type="new_version">
      <doi>10.1590/new-version-example-008</doi>
      <date media_type="online">
        <month>10</month><year>2025</year>
      </date>
    </update>
  </updates>
</crossmark>

9. partial_retraction

Retratação de parte do artigo (seção, figura, experimento).

<crossmark>
  <crossmark_version>1</crossmark_version>
  <crossmark_policy>https://www.scielo.br/crossmark-po...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes scieloorg/packtools#1118

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/scieloorg/packtools/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

- Add RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPE mapping dict
- Add CROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE mapping dict
- Implement xml_crossref_crossmark_pipe() function
- Add crossmark pipe to pipeline_crossref()
- Add 12 tests covering all crossmark update types and edge cases"

Co-authored-by: robertatakenaka <[email protected]>
Copilot AI changed the title [WIP] Add functionality to include crossmark in Crossref XML Add <crossmark> element to Crossref XML output Mar 6, 2026
@robertatakenaka robertatakenaka marked this pull request as ready for review March 6, 2026 13:38
Copilot AI review requested due to automatic review settings March 6, 2026 13:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Crossref Crossmark support to the XML pipeline by implementing xml_crossref_crossmark_pipe() in crossref.py. It generates <crossmark> elements with update metadata derived from JATS <related-article> elements and <history> dates, mapping JATS article relationship types to Crossref update types (e.g., correction-forwardcorrection, retracted-articleretraction).

Changes:

  • Adds two mapping dictionaries (RELATED_ARTICLE_TYPE_TO_CROSSMARK_UPDATE_TYPE and CROSSMARK_UPDATE_TYPE_TO_HISTORY_DATE_TYPE) to map JATS related-article types to Crossref update types and their corresponding history date types.
  • Implements xml_crossref_crossmark_pipe() function that builds <crossmark> elements with version, policy, and update entries (including DOI and date), appending them to each <journal_article> in the Crossref output.
  • Adds comprehensive tests covering various related-article types, multiple updates, fallback date logic, custom policy, unmappable types, and multi-article application.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packtools/sps/formats/crossref.py Adds mapping dicts and xml_crossref_crossmark_pipe() function; integrates it into pipeline_crossref()
tests/sps/formats/test_crossref.py Adds CrossmarkPipeTest class with 11 test cases covering the new crossmark functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xml_crossref_doi_pipe(xml_crossref, xml_tree)
xml_crossref_resource_pipe(xml_crossref, xml_tree)
xml_crossref_collection_pipe(xml_crossref, xml_tree)
xml_crossref_crossmark_pipe(xml_crossref, xml_tree, data)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crossmark element is appended to journal_article after doi_data (since the pipe is called at line 559, after xml_crossref_collection_pipe which populates doi_data). However, according to the Crossref 4.4.0 schema (xs:sequence in ct:journal_article), crossmark must appear before fr:program, ai:program, rel:program, and doi_data.

The current element order in the output will be: ...publisher_item, ai:program, rel:program, doi_data, crossmark, citation_list, but the schema requires: ...publisher_item, crossmark, ..., ai:program, rel:program, ..., doi_data, citation_list.

This call should be moved earlier in the pipeline, before xml_crossref_permissions_pipe (line 553), to produce schema-valid XML. For example, placing it right after xml_crossref_elocation_pipe would put crossmark in the correct position between publisher_item and ai:program.

Copilot uses AI. Check for mistakes.
Comment on lines +1856 to +1874
<journal_article language="pt" publication_type="research-article" reference_distribution_opts="any">
<doi_data>...</doi_data>
<crossmark>
<crossmark_version>1</crossmark_version>
<crossmark_policy>https://www.scielo.br/crossmark-policy</crossmark_policy>
<updates>
<update type="correction">
<doi>10.1590/erratum-doi</doi>
<date media_type="online">
<month>07</month>
<year>2025</year>
</date>
</update>
</updates>
</crossmark>
</journal_article>
</journal>
</body>
</doi_batch>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring example shows <crossmark> as a sibling after <doi_data>, but according to the Crossref 4.4.0 schema (xs:sequence in journal_article), crossmark must appear before doi_data. When the pipeline ordering is fixed (see comment on pipeline_crossref), this docstring example should also be updated to show <crossmark> before <doi_data>.

Copilot uses AI. Check for mistakes.
Comment thread packtools/sps/formats/crossref.py Outdated
"""
if data is None:
data = {}
crossmark_policy = data.get(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot na ausência do crossmark_policy, execute return None, saindo da função

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5cd6f4d. crossmark_policy is now required — if absent from the data dict, the function returns immediately without generating any <crossmark> element. A new test test_crossmark_pipe_no_policy_returns_early covers this behavior.

Comment thread packtools/sps/formats/crossref.py Outdated
if data is None:
data = {}
crossmark_policy = data.get(
"crossmark_policy", "https://www.scielo.br/crossmark-policy"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot não adotar nenhum valor default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b7224a9. The data parameter no longer has a default value — the signature is now xml_crossref_crossmark_pipe(xml_crossref, xml_tree, data). No default values are adopted anywhere in the function.

- Remove default value for crossmark_policy
- Add early return when crossmark_policy is not provided
- Update tests to pass crossmark_policy explicitly
- Add test_crossmark_pipe_no_policy_returns_early to cover new behavior

Co-authored-by: robertatakenaka <[email protected]>
- Change signature from data=None to data (required positional arg)
- Replace `if data is None: data = {}` guard with single `data.get(...) if data else None`
- Update test to pass None explicitly instead of omitting the argument

Co-authored-by: robertatakenaka <[email protected]>
@robertatakenaka robertatakenaka merged commit 133d8ab into master Mar 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Criar a funcionalidade de adicionar <crossmark/> ao XML do Crossref

3 participants