Improve quote convention detection accuracy#235
Conversation
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @benjaminking)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):
class QuotationMarkCounter:
Can you just use Counter and move this threshold to the PreliminaryQuotationMarkAnalyzer?
benjaminking
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Can you just use
Counterand move this threshold to thePreliminaryQuotationMarkAnalyzer?
Unfortunately, the total() method for Counter (which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.
Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.
Enkidu93
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @benjaminking)
machine/punctuation_analysis/preliminary_quotation_mark_analyzer.py line 14 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Unfortunately, the
total()method forCounter(which I would need to compute proportions) was only added in Python 3.10. I could keep track of the total separately, but it seems cleaner for now to use a separate class.Plus there is something I like about having the proportion logic and threshold decoupled and encapsulated.
Sounds good. I wondered about total().
Enkidu93
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on @benjaminking)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #235 +/- ##
=======================================
Coverage 90.91% 90.92%
=======================================
Files 337 337
Lines 21519 21542 +23
=======================================
+ Hits 19564 19586 +22
- Misses 1955 1956 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR improves the accuracy of quote convention detection for projects that are not consistent with their quotation marks by ignoring quotation marks that occur infrequently. This is response to many in-progress translation projects have been having no quote convention detected.
This change is