fix(models): Corrige o idioma das versões de tradução dos resumos#1076
Conversation
There was a problem hiding this comment.
Pull request overview
Adjusts abstract language detection so each abstract can be tagged with the correct language instead of always using the document-level xml:lang.
Changes:
- Update
XMLAbstracts.get_abstracts()to prefer the abstract node’sxml:langoverXMLAbstracts.langwhen buildingAbstractobjects.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| node, | ||
| self.lang, | ||
| node.get("{http://www.w3.org/XML/1998/namespace}lang") or self.lang, |
There was a problem hiding this comment.
The lang selection here still fails for abstracts inside a when the element itself does not have xml:lang (xml:lang is inherited in XML). In that common case (see file header example and tests/samples/article-abstract-en-sub-articles-pt-es.xml), this code falls back to self.lang (main article language) and mislabels the sub-article abstract language.
Consider deriving the effective language from the closest ancestor-or-self that declares xml:lang (e.g., via XPath ancestor-or-self::*[@xml:lang][1]) and only then falling back to the document/root language.
| for node in self.xmltree.xpath(xpath): | ||
| abstract = Abstract( | ||
| node, | ||
| self.lang, | ||
| node.get("{http://www.w3.org/XML/1998/namespace}lang") or self.lang, |
There was a problem hiding this comment.
This change introduces/updates language resolution behavior for XMLAbstracts.get_abstracts, but there are currently no tests covering XMLAbstracts (only Abstract.text). Adding a unit test that parses a small XML with both and a … case would prevent regressions in language detection.
Descrição
Esta alteração aprimora a extração de resumos no módulo
XMLAbstracts. Anteriormente, o idioma era definido de forma estática através deself.lang. Agora, o código tenta primeiro obter o idioma diretamente do atributoxml:langdo nó XML específico e utiliza oself.langapenas como valor padrão (fallback).Motivação
Em documentos XML JATS, é comum haver múltiplos resumos (ex: original e traduções). Cada nó
<abstract>pode carregar seu próprio atributo de idioma. Esta mudança garante que a classeAbstractseja instanciada com o idioma correto de cada nó, evitando que traduções sejam marcadas incorretamente com o idioma principal do artigo.Mudanças Principais
packtools/sps/models/v2/abstract.py{http://www.w3.org/XML/1998/namespace}langvia XPath/LXML antes de recorrer ao atributo de classe.Tipo de Alteração
Como testar?
pteen).XMLAbstracts.self.xmltree.xpath(xpath)reflete os idiomas individuais de cada nó e não apenas o idioma global definido no construtor.