Make mininode_lock non-reentrant#19178
Conversation
There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock.
glozow
left a comment
There was a problem hiding this comment.
Concept ACK, I just learned a lot about locks haha. I have a comment about timeouts because opening up the potential for deadlock scares me 🥶
| # This lock should be acquired in the thread running the test logic to synchronize | ||
| # access to any data shared with the P2PInterface or P2PConnection. | ||
| mininode_lock = threading.RLock() | ||
| mininode_lock = threading.Lock() |
There was a problem hiding this comment.
Seems like Lock will try to acquire indefinitely by default (I'm looking at these docs).
I tried out some re-acquiring behavior with this code and it hangs forever as expected... Just in case someone like me does some dumb deadlocky stuff, would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error?
There was a problem hiding this comment.
Actually I have a question because I'm not 100% sure how the test_runner works - does it allot a certain amount of time for each test? It looks like it would run infinitely? How would it affect Travis if 1 test is in deadlock?
There was a problem hiding this comment.
Travis is running our ci system, which calls the test_runner helper:
DOCKER_EXEC LD_LIBRARY_PATH=$DEPENDS_DIR/$HOST/lib ${TEST_RUNNER_ENV} test/functional/test_runner.py --ci $MAKEJOBS --tmpdirprefix "${BASE_SCRATCH_DIR}/test_runner/" --ansi --combinedlogslen=4000 ${TEST_RUNNER_EXTRA} --quiet --failfastIf a test takes a long time, it will show up in the log as Remaining jobs: [foo_bar.py].
If a test never finished, the virtual machine that the ci system runs on will eventually reboot and mark the run as failed.
There was a problem hiding this comment.
would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error
We use the with <lock>: syntax everywhere, which doesn't allow a blocking or timeout argument. I don't think there's a way to set a default time on a lock.
I think if you did accidentally introduce a deadlock, you'd figure it out pretty easily. If your script was hanging, I think the best way would be to implement some way to attach pdb to the running process (eg https://stackoverflow.com/a/25329467/933705 or https://stackoverflow.com/a/147114/933705) and then inspect the stack trace.
(in fact, I think this would be generally useful, so perhaps someone should implement it 🙂 )
There was a problem hiding this comment.
Ok this makes sense, thanks! I was worried I could accidentally stall Travis hahaha. Having some debugging help sounds useful, let me go look for a someone...
| # save txid | ||
| self.tx_invs_received[i.hash] += 1 | ||
|
|
||
| super().on_inv(message) |
There was a problem hiding this comment.
Good catch, I think this was from Amiti + me both adding it in separate PRs.
|
I wonder if we should actually change the Lock to be a Conditional Variable: https://docs.python.org/3/library/threading.html#condition-objects. A lot of what we do with the lock is polling for changes on objects, so having a notify() function that wakes the blocked thread could be much nicer. |
|
Concept ACK. Seems like a good first step for future cleanups. (e.g. making the lock a private member and then replacing the polling loops with cvs) |
gillichu
left a comment
There was a problem hiding this comment.
Concept ACK, I learned more about how locks are built into the Bitcoin testing framework too. I'm not familiar with whether polling is a resource consuming activity/busy waiting in the current implementation, but in general use cases I know CV's are a good replacement.
| super().on_inv(message) | ||
|
|
||
| def get_invs(self): | ||
| with mininode_lock: |
There was a problem hiding this comment.
I'm a bit confused on why get_invs needs to grab the lock, but on_inv doesn't? I understand that in the case of wait_for_broadcast, self.wait_until already acquires the lock, but I just wanted to confirm that on_inv already expects to be holding the lock on call.
There was a problem hiding this comment.
Yes. All of the on_<msgtype>() already hold the lock, since they're called by the on_message() method (specifically, the getattr(self, 'on_' + msgtype)(message) line)
|
Shower thought: The lock could even be removed completely without a cv replacement if all code that previously required the lock was executed in the network thread. Not sure if this makes sense at all, so consider it a "shower thought". |
|
ACK 6206838 😃 Show signature and timestampSignature: Timestamp of file with hash |
6206838 [tests] Make mininode_lock non-reentrant (John Newbery) c67c1f2 [tests] Don't call super twice in P2PTxInvStore.on_inv() (John Newbery) 9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() (John Newbery) edae607 [tests] Only acquire lock once in p2p_compactblocks.py (John Newbery) Pull request description: There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock. ACKs for top commit: MarcoFalke: ACK 6206838 😃 jonatack: ACK 6206838 Tree-SHA512: dcbc19e6c986970051705789be0ff7bec70c69cf76d5b468c2ba4cb732883ad512b1de5c3206c2eca41fa3f1c4806999df4cabbf67fc3c463bb817458e59a19c
Summary: This is a backport of Core [[bitcoin/bitcoin#18807 | PR18807]] [1/9] bitcoin/bitcoin@dab298d Note: the 3rd commit of this PR will not be backported, as it added a duplicate line later removed in [[bitcoin/bitcoin#19178 | PR19178]] Test Plan: proof-reading Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9028
Summary: bitcoin/bitcoin@edae607 [tests] Only acquire lock once in p2p_compactblocks.py bitcoin/bitcoin@9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() bitcoin/bitcoin@c67c1f2 [tests] Make mininode_lock non-reentrant Commit c67c1f2c032a8efa141d776a7e5be58f052159ea removing a double call to the parent method in `P2PTxInvStore.on_inv` is not relevant for Bitcoin ABC, as it fixes a bug that was never introduced in our codebase. We skipped that commit while backporting core#18807. To make it work, the following additional change is needed for Bitcoin ABC: Don't acquire mininode_lock twice in abc_p2p_avalanche.py. `on_avahello`, `on_avapoll` and `on_avaresponse` are called indirectly via `mininode.P2PInterface.on_message`, whit the lock already acquired. This is a backport of [[bitcoin/bitcoin#19178 | core#19178]] Test Plan: `ninja check-functional` Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Subscribers: deadalnix Differential Revision: https://reviews.bitcoinabc.org/D9430
There's no need for mininode_lock to be reentrant.
Use a simpler non-recursive lock.