mathtext: add support for unicode mathematics fonts#31064
mathtext: add support for unicode mathematics fonts#31064llohse wants to merge 5 commits intomatplotlib:text-overhaulfrom
Conversation
8009a09 to
b8c28d2
Compare
|
I opened a new PR, because this one is based on the text-overhaul branch and I messed up rebasing the old one. @QuLogic: Would you kindly take a look at this? Is this something you would consider for merging? For further discussion in case you don't reject this feature alltogether: How should the math font be configured? This also depends a bit on how prominent it should be visible. I could not think of a good way without introducing a new parameter in rcparams. |
| 0x22d3: 0x22d2, | ||
| } | ||
|
|
||
| unicode_math_lut: dict[str, dict[CharacterCodeType, CharacterCodeType]] = { |
There was a problem hiding this comment.
Most of these are 1-to-1 mappings of a block; I wonder if there is a more compact representation that could be used? Something like:
# (start, end, new_start)
# up digits
(0x30, 0x39, 0x30),
...
# bf latin lower case
(0x61, 0x7a, 0x1d41a),
maybe plus a small dictionary with some of the exceptions, depending on how they fit into the blocks.
The lookup table could be generated from those if necessary.
There was a problem hiding this comment.
This is exactly the way I generated the lookup table, offline:
- map the entire range
- fix missing/moved codepoints based on a smaller lookup table
At some point I did consider writing special mapping functions but I figured that a lookup table might be preferable for performance.
Do you prefer to generate the lookup table (for example on module load) instead of hardcoding the entire table?
| } | ||
|
|
||
|
|
||
| class UnicodeMathFonts(TruetypeFonts): |
There was a problem hiding this comment.
IIUC, math fonts should have tables with various layout metrics. We currently have those hard-coded in the various FontsConstantsBase subclasses, and they are likely incorrect for an arbitrary math font.
So this will likely need to parse this data out of the font and implement at least get_axis_height that was added in #31046, get_xheight maybe using #31050, and get_quad from #31110. But it is likely that you will want to refactor some of those remaining uses of the constants so that they fetch the information from the fonts as well.
There was a problem hiding this comment.
I fully agree. Doing this may involve some refactoring though, because the FontsConstantsBase subclass could not be determined purely from fontname but would be dynamically populated based on the loaded OpenType font.
That said, I made some experiments locally. Unfortunately, Freetype does not parse the MATH table. We could use fonttools, which is a hard dependency anyway.
There are several open questions how to map the OpenType layout metrics to the legacy TeX-inspired variables used in mathtext. Does it make sense to postpone that to a separate PR and focus on the basics here?
There was a problem hiding this comment.
While I would like it to happen, I'm not sure this will be ready for 3.11. So if we're aiming for 3.12, I think it's okay to spend some time getting everything worked out in separate PRs.
a8083e3 to
5988fac
Compare
5988fac to
2a66d49
Compare
|
I have just rebased the branch, split the baseline images into a separate commit, and added logic to handle mathnormal from #31121 in the new From my perspective, it is ready for another review. |
|
|
||
| def __init__(self, default_font_prop: FontProperties, load_glyph_flags: LoadFlags): | ||
| TruetypeFonts.__init__(self, default_font_prop, load_glyph_flags) | ||
| prop = mpl.rcParams['mathtext.mathfont'] # type: ignore[index] |
There was a problem hiding this comment.
I think this will break the caching for MathTextParser._parse_cached, but I see it already doesn't really handle any of the other mathtext.* rcParams, so I guess that's something to work out at some point.
lib/matplotlib/_mathtext.py
Outdated
| 'tt': 'tt', | ||
| 'sf': 'sfup', | ||
| 'bf': 'bfup', | ||
| 'bfit': 'bfup', |
There was a problem hiding this comment.
Typo or no such possibility? Probably could use a comment for the latter.
There was a problem hiding this comment.
Bold italic digits are not defined in the Unicode standard. Actually, neither are regular italic digits (which I need to fix in the code).
In unicode-math, both are mapped to their upright variants.
This is related to a bigger open question. unicode-math maps \mathit (and \mathbf, \mathrm, \mathsf and \mathtt) to the corresponding non-mathematics fonts, which support all alphanumeric symbols and apply normal text kerning rules (and ligatures).
I have not defined such a mechanism here to keep it simple. Mapping mathrm is trivial. In principle one could load italic and bold versions with their corresponding options, too. Sans serif and typewriter would be a bigger stretch. I am not sure it is worth it, though.
Note that this only matters because we now have proper italic support in the other font classes.
I see two options:
- simply map to their upright variants
- implement loading the corresponding text fonts and try to apply unicode-math's logic
I would prefer option 1 and leave option 2 as a project for the future.
Adds basic support for generic unicode OpenType mathematics fonts such as STIX Two Math or Cambria Math to be used within the mathtext engine.
2a66d49 to
9455799
Compare
PR summary
supersedes #31048
Add basic support for generic unicode OpenType mathematics fonts such as STIX Two Math, Cambria Math, DejaVu Math, etc.
Currently, mathematics text rendering through mathtext in matplotlib supports a hard-coded number of fonts (configured via
mathtext.fontset). Its design presumably predates the specification of mathematics alphabets in the unicode standard. While it is possible to configure custom fonts (mathtext.fontset: custom), this requires to set separate fonts for upright, italic, fraktur, double-struck, etc. variants -- which is fundamentally incompatible with the way modern mathematics fonts are designed.Unicode defines mathematical alphanumeric symbols as unique codepoints (see https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols), in contrast to different fonts all defining different styles for the same ASCII characters/codepoints.
One relatively modern way to render mathematical formulas uses mathematics fonts such as STIX Two Math or Cambria Math, Asana Math, etc.. For LaTeX, this is implemented in the unicode-math package.
Instead of choosing a font based on the style (as it is currently done in matplotlib) to render the same codepoints, this maps alphanumeric characters to different codepoints based on the style, and render them from a single font.
Shortcomings of the status quo:
Changes
This change implements basic functionality to use any installed unicode OpenType mathematics fonts for use in mathtext in a portable way. Currently, this can be enabled by setting the rcparams
I could think of different ways to configure this, though.
Internally, I have implemented a separate class
UnicodeMathFonts(TruetypeFonts)to no interfere with the existing fontsets.Running the test currently requires STIX Two Math to be installed on the system. For that reason, I have added it to the test data. One may think about vendoring STIX Two Math or DejaVu Math via mpl-data instead.
Examples
PR checklist