Lint: Update headers and checks per current guidance & provide helpful feedback by CAM-Gerlach · Pull Request #2484 · python/peps

CAM-Gerlach · 2022-03-30T06:37:27Z

Over the past >year, we've made significant progress in programmatically parsing the PEP headers, using them to intelligently display more useful, informative and readable output in the rendered PEPs, conforming them to our modern guidance, automatically checking their format and (particularly now with #2475) making it easy for other tools to consume and enrich them.

As a final step toward these overarching goals, this PR:

Adds the previously-missing automatic checks for all the remaining headers, so errors in them that could affect rendering and human- and machine-readability no longer pass silently
Updates the existing checks to reflect the current format specified by PEP 1/12 and expected by current and future tooling
Conforms the modest number of remaining existing PEPs to ensure the headers render correctly and are machine-readable, and
Provides much more helpful, specific and user-friendly descriptive check text.

Overall, this PRs enhancements to our existing checks:

Helps PEP authors, by giving them near-instant, automatic and targeted feedback about any syntax issues, locally or on CI, without having to for a full rebuild or a human review
Helps PEP editors by freeing them from the need to manually inspect and fix PEP headers
Avoids any edge cases in our rendering pipeline
Helps both readers and tools by ensuring the output is easy for both humans and machines to read, and
Opens the door to more "smart" processing in the future (e.g. automatic Discussions-To generation) to reduce PEP author/editor workload while making the displayed output more useful and readable.

To note, this PR does not make any new headers required, invalidate any existing established header formats nor require any of the newer ones; the new checks only trigger on the formats.

In addition to the above, this resolves the linting side of #2402 and supports the improvements in #2475 and #2467 .

hugovk

I still have some concerns about adding email addresses without permission, and replacing user at example.com with [email protected], but everything else is fine, and I won't block merging over it. Thanks!

CAM-Gerlach · 2022-04-16T23:22:28Z

I still have some concerns about adding email addresses without permission,

Well again, the only email addresses added in the current version of the PR were those for same core devs who already chose to include them the headers of their other PEPs, and which are all already listed in the Authors Index, so it is only making things consistent between PEPs. Also, I'd initially thought from PEP 12 that the email address was a required element and had the checks enforce that for new PEPs, which is why I'd initially made the change, but I'd then realized after cross-referencing PEP 1 that it wasn't, and updated the checks accordingly.

Much the same is true of user at example.com -> [email protected]—its already obfuscated automatically, having it in the source makes authors (like myself at time) think they had to do it manually, and it modestly complicates the processing and checks, but likewise I quickly realized a relatively small simplification of the code was not worth the potential for more trouble for authors, so I ensured the checks and processing code allow it.

In both cases, there is little benefit conforming the existing source headers when neither is actually required; its just that likewise, it would also be a fair amount of work for no clear benefit to reverting them either, since the emails are already listed and obfuscated. But as it seems its the one thing that's blocking this PR, I guess I'll just bite the bullet and spend the time doing it.

CAM-Gerlach · 2022-04-17T02:40:20Z

All done—this PR now does not add any existing emails, nor does it conform them to the standard address format. Also fixed what would be a conflict with #2531 , clarified the check feedback a bit further, and a couple other minor refinements.

hugovk

I appreciate all your work here, thank you!

CAM-Gerlach · 2022-04-17T23:02:01Z

Actually, I noticed there's actually a significant problem as it stands allowing manual obfuscation—glancing at the _mask_email() source code as part of the trailing comma oversight you noted on #2531 and I then fixed in #2467 , I notice that the automatic obfuscation is actually a fair bit more effective than just replacing @ with a literal at; rather, it actually uses the raw HTML entity codes for some of the characters, significantly raising the bar for scrapers.

However, since emails with manual obfuscation applied don't get converted into reference nodes, the automatic obfuscation logic doesn't work, so these emails actually are substantially less obfuscated than they would otherwise be (which PEP authors are almost certainly unaware of, and wasn't even to me until I dug deep into the code and actually tested it). Picking on myself here, from PEP 0, compare these two:

<tr class="[row-odd]()"><td>Gerlach, C.A.M.</td>
<td>cam.gerlach at gerlach.cam</td>
</tr>

versus, e.g.

<tr class="[row-even]()"><td>van Rossum, Guido (GvR)</td>
<td>guido&#32;&#97;t&#32;python.org</td>
</tr>

I attempted to add support for also properly masking manually obfuscated emails, but after some testing realized it would require some pretty significant code changes due to how things are currently structured, as well as some hacky and potentially unreliable heuristics. Furthermore, I realized this also makes the not showing the email addresses all, or only as abbreviations, as discussed in #2514 more complicated, since whether or not the email is actually processed as an email changes the doctree structure and node types.

Therefore, I conclude it would be best to re-revert the part of the previous change to conform the small minority of emails that were manually obfuscated to use standard email syntax and restore the linter check for such, so they are correctly processed and masked by the header transform code (and anything else that needs to mask/obfuscate/elide them), in order to ensure that the various automatic measures to protect authors' emails work consistently.

CAM-Gerlach · 2022-04-19T07:44:10Z

Actually, upon giving this more thought, making the author-emails abbrs instead of literal text as discussed in #2514 still requires doing fairly involved string-munging anyway due to having to parse and transform the older Email (author) format, so while it does make things a little more complicated, its not that much worse than parsing a while different syntax.

As such, I suggest we just go ahead and merge this PR as-is with the manual obfuscation still untouched, and then I can address properly masking manually "obfuscated" emails along with formatting them consistently as abbrs in a followup PR. It would also be pretty easy to improve the obfuscation further by choosing different Unicode lookalike characters for the spaces and letters, which coupled with being embedded in the abbr and using raw character codes should make it virtually impossible for spam harvesting, far more so than the common and well known replacement of @ with at.

CAM-Gerlach · 2022-04-20T09:52:51Z

Since it seems I've satisfied the immediate concerns that were raised and the two PEP editors that previously reviewed have ✔️ ed, this PEP has been open for a while and it seems there aren't any further objections, and it is blocking some further discussed and agreed changes (abbr for emails, making Content-Type optional, etc.,), I'll go ahead and finally merge this now.

the-knights-who-say-ni added the CLA signed label Mar 30, 2022

CAM-Gerlach self-assigned this Mar 30, 2022

CAM-Gerlach added the lint Linter-related work and linting fixes on PEPs label Mar 30, 2022

CAM-Gerlach marked this pull request as ready for review March 30, 2022 06:41

CAM-Gerlach mentioned this pull request Apr 16, 2022

Style: Reduce space consumed by headers and improve alignment #2533

Merged

hugovk reviewed Apr 16, 2022

View reviewed changes

hugovk approved these changes Apr 17, 2022

View reviewed changes

CAM-Gerlach mentioned this pull request Apr 17, 2022

PEP 639: Update header, footer, link, reference and code block syntax #2531

Merged

CAM-Gerlach added 17 commits April 19, 2022 23:53

Lint: Disable a couple potentially problematic/unneeded hooks

c826909

Lint: Refine regex for existing Content-Type and PEP references checks

e03b73c

Lint: Refine regex for Python-Version check and fix a couple deviations

8df4c4d

Lint: Refine regex for Created check and fix a couple deviations

2c98989

Lint: Refine regex for Resolution check and fix a few deviations

e75643e

Lint: Add new check for Post-History header

7abaaf3

Lint: Add check for Discussions-To header link

ad6ba56

Lint: Add basic check for PEP title presense & excessive length

dd30fff

Lint: Add check for PEP/BDFL-Delegate header

309ec3c

Lint: Add check for Sponsor header

b3fef66

Lint: Add check for legacy Author field format

eff7423

Lint: Add check for RST Author field

a829dff

Lint: Use more helpful, user-oriented descriptions for header checks

90add11

PEP 245: Add historical note and archive links to wiki page/email lsit

3712bba

PEP 628: Add direct link to resolution in acceptance message

afd0a40

Lint: Explictly allow all valid RFC 2822 line continuations in headers

b51cde8

Lint: Tweak Post-History and Resolution checks to accept fragments

a894d22

CAM-Gerlach mentioned this pull request Apr 21, 2022

PEP 11: Add Discussions section #2544

Merged

CAM-Gerlach mentioned this pull request May 8, 2022

Decouple and unify PEP header processing for rendering, PEP 0, JSON, RSS and linting #2587

Open

erlend-aasland mentioned this pull request May 13, 2022

pep8/greppable exception messages erlend-aasland/peps#1

Closed

erlend-aasland mentioned this pull request Jun 27, 2022

pep 687/mark as accepted erlend-aasland/peps#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lint: Update headers and checks per current guidance & provide helpful feedback#2484

Lint: Update headers and checks per current guidance & provide helpful feedback#2484
CAM-Gerlach merged 17 commits intopython:mainfrom
CAM-Gerlach:lint-update-header-checks

CAM-Gerlach commented Mar 30, 2022 •

edited

Loading

Uh oh!

hugovk left a comment

Uh oh!

CAM-Gerlach commented Apr 16, 2022 •

edited

Loading

Uh oh!

CAM-Gerlach commented Apr 17, 2022

Uh oh!

hugovk left a comment

Uh oh!

CAM-Gerlach commented Apr 17, 2022 •

edited

Loading

Uh oh!

CAM-Gerlach commented Apr 19, 2022

Uh oh!

CAM-Gerlach commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

CAM-Gerlach commented Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hugovk left a comment

Choose a reason for hiding this comment

Uh oh!

CAM-Gerlach commented Apr 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CAM-Gerlach commented Apr 17, 2022

Uh oh!

hugovk left a comment

Choose a reason for hiding this comment

Uh oh!

CAM-Gerlach commented Apr 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CAM-Gerlach commented Apr 19, 2022

Uh oh!

CAM-Gerlach commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

CAM-Gerlach commented Mar 30, 2022 •

edited

Loading

CAM-Gerlach commented Apr 16, 2022 •

edited

Loading

CAM-Gerlach commented Apr 17, 2022 •

edited

Loading