Skip to content

xml: make structs serializable, but enforce contracts#5283

Merged
mflatt merged 2 commits intoracket:masterfrom
LiberalArtist:serialize-xexpr
Jun 24, 2025
Merged

xml: make structs serializable, but enforce contracts#5283
mflatt merged 2 commits intoracket:masterfrom
LiberalArtist:serialize-xexpr

Conversation

@LiberalArtist
Copy link
Copy Markdown
Contributor

@LiberalArtist LiberalArtist commented Jun 21, 2025

This addresses the issue with #5282 that could allow evading contracts.

Since an X-expression can contain a location in a source-start or source-stop, I needed to add serialization support to xml/private/structures in addition to xml/private/core. At that point, it seemed to make sense to make the structs that aren't used in X-expressions serializable, too.

One caveat is that contract violations on deserialization will blame the deserialize-info submodules. That's not theoretically ideal, but seems practically fine.

Still needs docs and tests.

Related to: racket/scribble#498
Related to: #5282

@stevebyan
Copy link
Copy Markdown

Thank you!

Fix `xexpr/c`, `xexpr?`, `correct-xexpr?`, and `validate-xexpr`
to reject instances of the `pcdata` struct, consistent with
the documentation and other parts of the implementation.

Trying to use `pcdata` structs as X-expressions was already
very broken, e.g.:

```
$ racket
Welcome to Racket v8.17 [cs].
> (require xml)
> (xexpr->string (pcdata #f #f "pcdata"))
""
```
@LiberalArtist
Copy link
Copy Markdown
Contributor Author

I've added docs and tests.

While working on the tests, I found that a bug in xexpr/c, xexpr?, corrext-xexpr?, and validate-xexpr? incorrectly accepted instances of the pcdata struct. I've changed them to match the documentation and other parts of the implementation. To be clear, actually trying to use a pcdata instance as an X-expression was already very broken, e.g.:

$ racket
Welcome to Racket v8.17 [cs].
> (require xml)
> (xexpr->string (pcdata #f #f "pcdata"))
""

Since this is so broken, I'm hoping it will work to make this change, but I've put it in a separate commit in case someone is somehow relying on the current behavior and we have to revert. I've put that commit before the one adding serialization, because, when the structs become serializable, the module path providing their deserialize info becomes part of the API. Having tried it both ways, it's much nicer to move from (submod xml/private/structures deserialize-info) to (submod xml/private/core deserailize-info) than the other way around. It can be just:

(require (only-in (submod "core.rkt" deserialize-info) deserialize-info:p-i-v0))
(provide deserialize-info:p-i-v0)

rather than:

(provide deserialize-info:p-i-v0)
(define deserialize-info:p-i-v0
  (dynamic-require (module-path-index-join '(submod "." deserialize-info)
                                           mpi-for-contracts)
                   'deserialize-info:p-i-v0))

I also included some other doc tweaks with the serialization commit. The addition of no-external-dtd was the result of one of those side-quests.

@LiberalArtist LiberalArtist marked this pull request as ready for review June 23, 2025 07:54
(require 'serialization-support)
(define-runtime-module-path-index deserialize-info-mpi '(submod "." deserialize-info))
(module+ deserialize-info
(define-runtime-module-path-index mpi-for-contracts "structures.rkt")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to double-check that this doesn't need to be:

(define-runtime-module-path mpi-for-contracts xml/private/structures)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using '(submod "." deserialize-info) seems fine to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking about the one that is currently (define-runtime-module-path-index mpi-for-contracts "structures.rkt"). I vaguely remembered that some ways of mixing styles of references could cause trouble for raco exe in some cases, but I've never actually encountered those problems myself, so I'm not very clear about it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if someone referred to "core.rkt" through a file path instead of a module path, then there would be a difference for both of those relative module references, and there could be a mismatch. For that case, using relative paths as here might be better, or it might be worse, depending on the context. In any case, referencing "core.rkt" both through a file path and a module path would create other (related) problems with raco exe, and I don't think we need to worry about it.

Also, tweak docs and add `no-external-dtd`.

Increment version to 8.17.0.5.

Related to: racket/scribble#498
Related to: racket#5282
Related to: racket/rhombus#644
@mflatt
Copy link
Copy Markdown
Member

mflatt commented Jun 24, 2025

This looks fine to me, so I'll merge soon if there are no objections.

@mflatt mflatt merged commit 5d65c77 into racket:master Jun 24, 2025
8 checks passed
@stevebyan
Copy link
Copy Markdown

stevebyan commented Jun 25, 2025

Wow! That's one heck of a commit. I've so much to learn. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants