doc: explain that gfi is for training and add no AI policy#31142
doc: explain that gfi is for training and add no AI policy#31142timhoffm merged 1 commit intomatplotlib:mainfrom
Conversation
jklymak
left a comment
There was a problem hiding this comment.
You have already explained what a good first issue is in the first paragraph. It is redundant to explain here. I'd just say that AI agents are not to post and link the policy.
If you feel the first paragraph does not adequately explain what a good first issue is, then you could modify that.
The aim of this new paragraph is to explain the purpose of good first issues - that they're for onboarding - rather than what a good first issue is (a tightly scoped low priority issue). I put it at the end b/c I don't think new contributors necessarily need to understand the organizational reasoning for gfis, but I think we should document the purpose for the sake of explaining the policy. |
cdb292f to
0940dbc
Compare
|
This isn’t true of all GFIs: we often also add the “medium difficulty” label, and explain that the issue is suitable for someone new to Matplotlib but not new to development in general. |
Not sure how this contradicts that the issue's purpose is more so for onboarding than b/c the project feels it's something that really needs to get done? Unless you mean that I should be more specific "opening and seeing through a pull request in Matplotlib"? |
|
I think I’m getting hung up on what the purpose of the GFI is
So maybe something like “helping people get started” rather than “training on the process…” Maybe we should also switch out priority/importance for “urgency”. When I was new here I worked on #24148, which I would say was definitely important (CI would eventually have broken without it) but not urgent. Telling someone “here is something unimportant we’d like you to work on” feels like giving them busy work, though maybe there is a cultural/pond difference in how we interpret the terms. I appreciate that I am somewhat contradicting the fact that I approved #31131, and apologies if I am being a bit incoherent in general. It has been a busy week and my brain is a bit fried! |
|
I am also wondering if we should revisit the idea of renaming the label 🤔 |
melissawm
left a comment
There was a problem hiding this comment.
I kind of agree with @rcomer on the wording, and I think we are using the "good first issue" label for two kinds of issues: good for onboarding new people, and easy/small issues. Both can happen at the same time, but they are different things.
I'd advocate for a label like "onboarding issue" instead of "good first issue" - it would maybe signal a bit better what the issue is for, without actually requiring that it be an easy issue (and we can use the "difficulty: easy" label for those). Onboarding issues can even be small projects, but serving primarily as an onboarding opportunity for a contributor that is motivated enough.
I know we could potentially achieve the same goal with the "good first issue" + "difficulty" labels, but I do think the gfi labels are, at this point, kinda contaminated to be honest. If we want to keep the gfi label to be consistent with other projects, then I'd maybe advocate for a slight rewording of the current proposal.
doc/devel/contribute.rst
Outdated
| Good first issues are a technical solution to the social problem of onboarding new | ||
| contributors to the repository; we label tasks good first issues because we think they | ||
| are useful for training on the process of opening and seeing through a pull request, not | ||
| because we think they are important technical issues that must be resolved. Therefore, | ||
| pull requests that use AI tools to fix issues labeled as "good first issues" will be | ||
| closed. |
There was a problem hiding this comment.
I kind of agree that saying these issues are "not important" may give folks the wrong idea, that this is just about busy work. How about something like:
| Good first issues are a technical solution to the social problem of onboarding new | |
| contributors to the repository; we label tasks good first issues because we think they | |
| are useful for training on the process of opening and seeing through a pull request, not | |
| because we think they are important technical issues that must be resolved. Therefore, | |
| pull requests that use AI tools to fix issues labeled as "good first issues" will be | |
| closed. | |
| Good first issues are a technical solution to the social problem of onboarding new | |
| contributors to the repository; we label tasks "good first issues" because we think they | |
| are useful for training contributors on the process of opening and seeing through a pull request. Good first issues are also not high-priority, and having the issue fixed is only a secondary goal. Therefore, | |
| pull requests that use AI tools to fix issues labeled as "good first issues" will be | |
| closed. |
|
Perhaps this whole section could be much more succinctly stated: "The Maintainers label Good First Issues because they are self-contained, are usually uncontroversial and with a clear solution, and are a good way to onboard new contributors to the process of creating a Matplotlib Pull Request. Good First Issues are not an appropriate venue for bots or agents (see our AI policy)." I'd not digress into the differences between a "Medium", "Hard" etc. |
I agree they're not in scope for GFI, but I'd break out this scale into a subsection of issues b/c it applies broadly there.
Arguably that's gsoc, but I think community practice has pushed towards gfis being onboarding issues (for example, the llvm docs ) so I think being clearer in purpose about gfis & decoupling leveling will get us to roughly the same place as adding a new tag. |
2f79bee to
10f0f9b
Compare
|
Ok so I decoupled leveling from GFI - new contributors should be reading the general issue guidance anyway - and tried to condense everything else into what I think is the working definition and purpose of gfi. |
571d493 to
8732a98
Compare
I really like this reframing. |
No problem, context influences reviews, Changed wording in bot from low priority to not urgent. |
I think this is how projects in other communities are already using gfi (Apache, LLVM, kubernetes) Part of the point of this PR is to more explicitly document that this is the context in which the good first issue tag should be applied.
Part of this is also that while I love the term onboarding, I think it's very much internal hr speak rather than outward facing accessible. I think "onboarding" is the criteria that maintainers should use to classify it, but that should translate into "good first issue" for the new contributor trying to find it. Like what would be an example of a good first issue that's not a good onboarding task, especially with leveling decoupled from gfi? |
|
I think all good first issues are good onboarding tasks but the problem is that a lot of would-be contributors (and now bots) read "good first issue" as "easily solvable" which won't always be the case. Unless we change how we're using the label - I haven't really followed what we mean by decoupling the difficulty levels, discussed above. To get away from the HR-speak, maybe something like "Newcomer Friendly" or "Newcomer Suitable"? |
|
I'm not attached to "onboarding" at all, other wordings also work here. Also none of my comments here should be blocking anyway so I think we should merge as-is and if this comes up again later we can always revisit 😄 |
Just that we should be evaluating difficulty of issue desperately from whether it's a good first task. They're kind of intermingled b/c we rank difficulty based on conceptual understanding of matplotlib, but there are also plenty of technically easy tasks that are illsuited for new contributors b/c of the social side (for example this PR).
I think different folks have been using the label in different ways, but I wanna move to us consistently using it in the onboarding context b/c that seems to be how the community at large is using it. Which is why I'm so hesitant about introducing a new lable we plan to use in the same way most folks use the old label. |
|
I don’t think the difficulty label is about how well you know Matplotlib. Often “medium difficulty” is used together with GFI to say you don’t need to know Matplotlib internals, but you do need experience in something more general like testing frameworks, or because the code you’re going to need to study is a bit involved. I’m going to tag @tacaswell here because I think he most often adds that combination of labels. I think this project is already incredibly good at attracting inexperienced contributors who make a PR for their course credit and don’t stick around. Changing the definition of GFI to helping people with the PR process seems like it is going to codify that those are the people we want. If a more experienced person happens to take an interest in this project they will likely not be worried about the process of making a PR, but some way to signal which issues are more accessible to them would be helpful. Of course those people will happen by far less often, but when they do they will likely make stronger contributions. |
|
As for the issue name, last time we discussed changing it, it was about making it less discoverable so we don’t get several people all wanting to work on it at once. Making it less discoverable may also help with the recent problem of AI spam. This would of course mean we would be doing something different from other OSS projects. Perhaps @melissawm can speak to how the name change worked out for Numpy, as that was a few years ago now. |
I think this intersects a little w/ our reluctance to close bad PRs. I'm probably more guilty than most, but we really should be closing the ones where two interactions in it's clear they don't understand what they're doing. And redirect those folks to doing something more productive for the library. Also been thinking of github's new limit PRs feature and asked for an "allow" list version b/c I think that could be a different way to get at this - folks have to show some understanding of the issue and solution in the PR b/f we allow 'em to make a PR. We could also have a manual version of this where we just close PRs that don't show understanding. |
|
This is a multi-faceted topic and there will not be the one solution. Random thoughts:
|
tried to make this the more explicit entry point in #31163. thankfully @melissawm's back to help w/ the ncms 😄
Same. I think we've avoided assignment 'cause process overhead, but we've basically ended up implementing a worse implicit system where nobody knows what to prioritize and everyone is burned out. I think even in #29686 , most folks (at least grudgingly) conceded to an assignment system b/c it's explicit and enforceable. |
Updates since this comment make me more comfortable with that - i.e. removing the part about "the process of making a PR". I think #23548 is an example where discussion about exactly what should be done is still needed, but it's GFI because you don't need to know anything about the Matplotlib code to work on it. |
This seems to lay out the what #23548 (comment) and I think that's actually what's tripping up the contributors - they're seeing that comment and not reading through the later discussion that modifies it. Also I think what's ambiguous here is more the how than the what. |
712dfff to
ebfb7b1
Compare
|
Thanks all and especially @story645 - I think this wording is great. For the second discussion happening here, which is how do we more efficiently triage new contributions, there are some things we can consider.
I would like to suggest moving this discussion to discourse, maybe, or another issue so we can unblock this PR. |
|
1 and 3 are things we already do (the calls have just had too much other stuff lately for triage) we can maybe open an issue about 2? |
timhoffm
left a comment
There was a problem hiding this comment.
It’s almost there. Some wording and clarification.
dc6b8a9 to
de439a3
Compare
| - It has less clearly defined tasks, which require some independent | ||
| exploration, making suggestions, or follow-up discussions to clarify a good | ||
| path to resolve the issue. |
There was a problem hiding this comment.
I deleted this b/c I think this means there's not enough info yet to evaluate difficulty and the issue needs more triage:
graph TD;
A(is the issue) -->unresolvable-->C(explain why <br /> technically infeasible <br /> or not aligned with <br /> values/mission/scope/etc);
A-->resolved--> B(explain why, <br /> link to docs, API, etc) ;
A-->resolvable-->D(work w/ issue author <br /> on an implementable <br /> code/docs solution)
A-->E(not sure)-->F(ask for more information) -->A;
There was a problem hiding this comment.
This is in the same vein as the proposed "not ready for contribution" label @melissawm proposed in #31142 (comment)
|
|
||
| This issue is suited to new contributors because it does not require | ||
| understanding of the Matplotlib internals. This is a low priority task | ||
| understanding of the Matplotlib internals. This is a not urgent task |
There was a problem hiding this comment.
| understanding of the Matplotlib internals. This is a not urgent task | |
| understanding of the Matplotlib internals. This is a nonurgent task |
jklymak
left a comment
There was a problem hiding this comment.
The sentence L268 forward could be simplified....
doc/devel/contribute.rst
Outdated
| Matplotlib, and do not need urgent resolution. Pull requests that are :ref:`AI generated <generative_ai>` | ||
| will be closed because good first issues are intended to onboard newcomers with a | ||
| genuine interest in improving Matplotlib in the hopes they will continue to participate | ||
| in our development community. |
There was a problem hiding this comment.
Maybe:
| Matplotlib, and do not need urgent resolution. Pull requests that are :ref:`AI generated <generative_ai>` | |
| will be closed because good first issues are intended to onboard newcomers with a | |
| genuine interest in improving Matplotlib in the hopes they will continue to participate | |
| in our development community. | |
| Matplotlib, and do not need urgent resolution. Pull requests to Good First Issues that are :ref:`AI generated <generative_ai>` | |
| will be closed. These issues are reserved to provide hands-on experience for new contributors. |
There was a problem hiding this comment.
I like the part about hoping they will continue to participate...
There was a problem hiding this comment.
OK, so maybe make it a concluding sentence to the whole paragraph, rather than tack it onto the AI sentence.
There was a problem hiding this comment.
but it's specifically why we don't allow AI here. I want to be explicit about the reasoning b/c this PR was largely motivated by my wanting to explain why our gfi policy is not AI.
There was a problem hiding this comment.
Maybe I can invert things though so that it doesn't feel so tacked on.
There was a problem hiding this comment.
Also the continuing to participate is specifically cause of @timhoffm's review #31142 (comment)
|
coverage failure definitely unrelated to PR |
doc/devel/contribute.rst
Outdated
| Matplotlib, and do not need urgent resolution. Good first issues are intended to onboard | ||
| newcomers with a genuine interest in improving Matplotlib in the hopes that they will | ||
| continue to participate in our development community; therefore, pull requests that are | ||
| :ref:`AI generated <generative_ai>` will be closed because |
timhoffm
left a comment
There was a problem hiding this comment.
This is good, apart from the unfinished sentence.
7310d60 to
0866b56
Compare
move difficulty under issues [ci doc] Co-authored-by: Tim Hoffmann <[email protected]> Co-authored-by: Ruth Comer <[email protected]>
PR summary
Jumping off #31131 and related, adds a paragraph to the contributing guide explaining that good first issues are for training rather than b/c they're an urgent technical need. Unsure if it should be broad "use AI tools" or more specific "use AI tools without human in the loop"
attn: @scottshambaugh @melissawm
PR checklist