Make mininode_lock non-reentrant by jnewbery · Pull Request #19178 · bitcoin/bitcoin

jnewbery · 2020-06-05T15:03:09Z

There's no need for mininode_lock to be reentrant.
Use a simpler non-recursive lock.

There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock.

glozow

Concept ACK, I just learned a lot about locks haha. I have a comment about timeouts because opening up the potential for deadlock scares me 🥶

glozow · 2020-06-11T18:48:55Z

test/functional/test_framework/mininode.py

 # This lock should be acquired in the thread running the test logic to synchronize
 # access to any data shared with the P2PInterface or P2PConnection.
-mininode_lock = threading.RLock()
+mininode_lock = threading.Lock()


Seems like Lock will try to acquire indefinitely by default (I'm looking at these docs).
I tried out some re-acquiring behavior with this code and it hangs forever as expected... Just in case someone like me does some dumb deadlocky stuff, would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error?

Actually I have a question because I'm not 100% sure how the test_runner works - does it allot a certain amount of time for each test? It looks like it would run infinitely? How would it affect Travis if 1 test is in deadlock?

Travis is running our ci system, which calls the test_runner helper:

DOCKER_EXEC LD_LIBRARY_PATH=$DEPENDS_DIR/$HOST/lib ${TEST_RUNNER_ENV} test/functional/test_runner.py --ci $MAKEJOBS --tmpdirprefix "${BASE_SCRATCH_DIR}/test_runner/" --ansi --combinedlogslen=4000 ${TEST_RUNNER_EXTRA} --quiet --failfast

If a test takes a long time, it will show up in the log as Remaining jobs: [foo_bar.py].

If a test never finished, the virtual machine that the ci system runs on will eventually reboot and mark the run as failed.

would it be a good idea to make mininode_lock use something likelock.acquire(timeout=60) and/or if lock.locked(): raise_assertion_error

We use the with <lock>: syntax everywhere, which doesn't allow a blocking or timeout argument. I don't think there's a way to set a default time on a lock.

I think if you did accidentally introduce a deadlock, you'd figure it out pretty easily. If your script was hanging, I think the best way would be to implement some way to attach pdb to the running process (eg https://stackoverflow.com/a/25329467/933705 or https://stackoverflow.com/a/147114/933705) and then inspect the stack trace.

(in fact, I think this would be generally useful, so perhaps someone should implement it 🙂 )

Ok this makes sense, thanks! I was worried I could accidentally stall Travis hahaha. Having some debugging help sounds useful, let me go look for a someone...

glozow · 2020-06-11T18:50:01Z

test/functional/test_framework/mininode.py

                # save txid
                self.tx_invs_received[i.hash] += 1

-        super().on_inv(message)


Good catch, I think this was from Amiti + me both adding it in separate PRs.

jnewbery · 2020-06-11T22:06:37Z

I wonder if we should actually change the Lock to be a Conditional Variable: https://docs.python.org/3/library/threading.html#condition-objects. A lot of what we do with the lock is polling for changes on objects, so having a notify() function that wakes the blocked thread could be much nicer.

maflcko · 2020-06-11T22:31:01Z

Concept ACK. Seems like a good first step for future cleanups. (e.g. making the lock a private member and then replacing the polling loops with cvs)

gillichu

Concept ACK, I learned more about how locks are built into the Bitcoin testing framework too. I'm not familiar with whether polling is a resource consuming activity/busy waiting in the current implementation, but in general use cases I know CV's are a good replacement.

gillichu · 2020-06-15T18:01:37Z

test/functional/test_framework/mininode.py

-        super().on_inv(message)
-
    def get_invs(self):
        with mininode_lock:


I'm a bit confused on why get_invs needs to grab the lock, but on_inv doesn't? I understand that in the case of wait_for_broadcast, self.wait_until already acquires the lock, but I just wanted to confirm that on_inv already expects to be holding the lock on call.

Yes. All of the on_<msgtype>() already hold the lock, since they're called by the on_message() method (specifically, the getattr(self, 'on_' + msgtype)(message) line)

maflcko · 2020-06-15T23:37:05Z

Shower thought: The lock could even be removed completely without a cv replacement if all code that previously required the lock was executed in the network thread. Not sure if this makes sense at all, so consider it a "shower thought".

jonatack

ACK 6206838

maflcko · 2020-06-16T09:45:34Z

ACK 6206838 😃

Show signature and timestamp

Signature:

Timestamp of file with hash 236b7f082f677ae84f62c6b2ec461c0d23883d147cfecbaf72127c0e22631b33 -

6206838 [tests] Make mininode_lock non-reentrant (John Newbery) c67c1f2 [tests] Don't call super twice in P2PTxInvStore.on_inv() (John Newbery) 9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() (John Newbery) edae607 [tests] Only acquire lock once in p2p_compactblocks.py (John Newbery) Pull request description: There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock. ACKs for top commit: MarcoFalke: ACK 6206838 😃 jonatack: ACK 6206838 Tree-SHA512: dcbc19e6c986970051705789be0ff7bec70c69cf76d5b468c2ba4cb732883ad512b1de5c3206c2eca41fa3f1c4806999df4cabbf67fc3c463bb817458e59a19c

Summary: This is a backport of Core [[bitcoin/bitcoin#18807 | PR18807]] [1/9] bitcoin/bitcoin@dab298d Note: the 3rd commit of this PR will not be backported, as it added a duplicate line later removed in [[bitcoin/bitcoin#19178 | PR19178]] Test Plan: proof-reading Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9028

Summary: bitcoin/bitcoin@edae607 [tests] Only acquire lock once in p2p_compactblocks.py bitcoin/bitcoin@9d80762 [tests] Don't acquire mininode_lock twice in wait_for_broadcast() bitcoin/bitcoin@c67c1f2 [tests] Make mininode_lock non-reentrant Commit c67c1f2c032a8efa141d776a7e5be58f052159ea removing a double call to the parent method in `P2PTxInvStore.on_inv` is not relevant for Bitcoin ABC, as it fixes a bug that was never introduced in our codebase. We skipped that commit while backporting core#18807. To make it work, the following additional change is needed for Bitcoin ABC: Don't acquire mininode_lock twice in abc_p2p_avalanche.py. `on_avahello`, `on_avapoll` and `on_avaresponse` are called indirectly via `mininode.P2PInterface.on_message`, whit the lock already acquired. This is a backport of [[bitcoin/bitcoin#19178 | core#19178]] Test Plan: `ninja check-functional` Reviewers: #bitcoin_abc, deadalnix Reviewed By: #bitcoin_abc, deadalnix Subscribers: deadalnix Differential Revision: https://reviews.bitcoinabc.org/D9430

jnewbery added 4 commits June 5, 2020 10:57

[tests] Only acquire lock once in p2p_compactblocks.py

edae607

[tests] Don't acquire mininode_lock twice in wait_for_broadcast()

9d80762

[tests] Don't call super twice in P2PTxInvStore.on_inv()

c67c1f2

[tests] Make mininode_lock non-reentrant

6206838

There's no need for mininode_lock to be reentrant. Use a simpler non-recursive lock.

jnewbery mentioned this pull request Jun 5, 2020

test: msg_mempool, fRelay, and other bloomfilter tests #19083

Merged

DrahtBot added the Tests label Jun 5, 2020

glozow reviewed Jun 11, 2020

View reviewed changes

gillichu reviewed Jun 15, 2020

View reviewed changes

jonatack reviewed Jun 16, 2020

View reviewed changes

maflcko merged commit 46bdd4b into bitcoin:master Jun 16, 2020

jnewbery deleted the 2020-05-mininode-lock-reentrancy branch June 16, 2020 14:44

bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022

Conversation

jnewbery commented Jun 5, 2020

Uh oh!

glozow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnewbery commented Jun 11, 2020

Uh oh!

maflcko commented Jun 11, 2020

Uh oh!

gillichu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko commented Jun 15, 2020

Uh oh!

jonatack left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko commented Jun 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants