Skip to content

Fix issue 732 semaphore exceptions #766

Open
Hatem-707 wants to merge 2 commits intotaskflow:masterfrom
Hatem-707:fix-issue-732
Open

Fix issue 732 semaphore exceptions #766
Hatem-707 wants to merge 2 commits intotaskflow:masterfrom
Hatem-707:fix-issue-732

Conversation

@Hatem-707
Copy link
Copy Markdown
Contributor

#732

Summary:

I have added a DFS recursive method , _handle_broken_semaphores, to the tf::Node class to remove deadlocks that happen because of exceptions thrown down the chain. This is done via calling it on all preceding nodes stored in _edges vector. The method is similar to tf::Semaphore::_release taking a reference to a SmallVector of Node * and storing the waiter nodes' pointers so they can be rescheduled removing the deadlock. There's also an unordered set to handle loops. I also added a new state, _is_broken, to the tf::Semaphore class which prevents acquiring a semaphore that has had an exception until it's reset.

Tests and benchmarks:

  • for now i only added a tweaked version of the test case presented in the initial issue to the semaphore unittest.
  • Measuring the time of the unittests as suggested in the guidelines showed no measurable performance hit, nor do i expect on as most of the expensive work is done when handling exceptions.

Notes:

This is my first time contributing and I would welcome any kind of feedback and I am open to adding more test/ documentation to my changes.

@Hatem-707
Copy link
Copy Markdown
Contributor Author

Summary

I changed the way i check for exceptions when attempting to acquire semaphores to avoid accessing the node pointer which might have moved to the semaphore queue. Also update the second test I added to allow exclusive access to the variable captured by reference.

Testing

the changes were tested with the compile flag -fsanitize=thread using clang on an arch system and passed all unittests with no warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant