Extends scope of transactions to rollback when using dry_run=True#1481
Extends scope of transactions to rollback when using dry_run=True#1481duck-nukem wants to merge 1 commit intodjango-import-export:release-3-xfrom
Conversation
|
@pokken-magic would appreciate your input as you raised #1447 |
|
Thanks for submitting this PR - I'll review in detail when I get a chance.
Yes I think that if this goes in, it would be safer to go into v3 |
|
Happy to take a look at this one when it's based to 3 and passing tests. Looks very very promising though! Thanks for doing it. |
I think the PR is ready for another round of CI, fingers crossed all tests pass 🤞🏼 |
|
I think we'll need to take a step back and try to understand exactly what's going on before we can merge. From what I have read from previous issues, the purpose of this fix is to stop the If you have any more insight into how this works, or how exactly it fixes the issue I would be interested to understand in more detail. For reference, this is the implementation of def set_rollback(rollback, using=None):
"""
Set or unset the "needs rollback" flag -- for *advanced use* only.
When `rollback` is `True`, trigger a rollback when exiting the innermost
enclosing atomic block that has `savepoint=True` (that's the default). Use
this to force a rollback without raising an exception.
When `rollback` is `False`, prevent such a rollback. Use this only after
rolling back to a known-good state! Otherwise, you break the atomic block
and data corruption may occur.
"""
return get_connection(using).set_rollback(rollback) |
| # Ensure the rollback has worked properly, no instances were created. | ||
| self.assertFalse(Author.objects.exists()) | ||
|
|
||
| def test_rollback_auxiliary_on_commit_transactions_when_dry_run(self): |
There was a problem hiding this comment.
We probably need an additional test to ensure that rollback is not called when using_transactions is False. Likewise we need to test get_connection() called for all these conditions:
if dry_run or \
result.has_errors() or \
(rollback_on_validation_errors and result.has_validation_errors()):
get_connection(using=db_connection).set_rollback(True)
I think that's correct
What I think is happening is that the savepoints in the current implementation limit the scope of the rollback to the inserts made during the import preview: we check if the to-be-imported rows could be saved to the DB to ensure there are no integration errors etc, then we only rollback those inserts. In the meanwhile signals have already been dispatched for related (background) tasks, but they are exempt from the rollbacks, so they'll be executed as if we had actually imported any data. |
|
I added the following to @receiver(post_save, sender=Author)
def post_save_callback(sender, **kwargs):
print("Save called")I then imported the attached Author via the Example App. With this fix we would expect the call back to be called once, in fact it gets called twice. It looks like the change prevents the new row from being written to the db before the "confirm" step, but the signal gets sent twice. Can you reproduce? Have I missed something here? |
|
AFAIK, setting a flag on the instance would be the only way to prevent the |
|
Fwiw I am not sure this solves 1447 because it's a peculiarity of file fields not directly connected to the save handlers (I don't think anyway). But it seems like a good idea if it can be made to work. Nothing else to add beyond Matthew's findings. Because the issue w 1447 is ultimately due to serializing files, which our library doesn't support, it's possible I should just close that one tbh. |
|
@matthewhegarty I'm starting to think maybe this "fix" was too specific for our use-case as we usually expect signals to be called on For example: @receiver(post_save, sender=Author)
def post_save_callback(sender, **kwargs):
- print("Save called")
+ transaction.on_commit(lambda: print("Save called"))That's very specific and I guess that'd be too much to impose on the users of the library. I'll try to spend some time on this (thanks for pointing me to author.csv + the test django instance), but unfortunately covid is kicking my butt atm, so unsure when will I be able to pick this up again. In the meantime feel free to close this PR and I'll come back with another one if I can find a better solution. Just FYI I'm thinking about either temporarily disabling all signals via a context manager in django's save_base method line 865 This from a quick prototyping seems to have no major side-effects, but it feels a bit hack-y def save_instance(self, instance, is_create, using_transactions=True, dry_run=False):
"""
Takes care of saving the object to the database.
Objects can be created in bulk if ``use_bulk`` is enabled.
:param instance: The instance of the object to be persisted.
:param is_create: A boolean flag to indicate whether this is a new object
to be created, or an existing object to be updated.
:param using_transactions: A flag to indicate whether db transactions are used.
:param dry_run: A flag to indicate dry-run mode.
"""
self.before_save_instance(instance, using_transactions, dry_run)
if self._meta.use_bulk:
if is_create:
self.create_instances.append(instance)
else:
self.update_instances.append(instance)
else:
if not using_transactions and dry_run:
# we don't have transactions and we want to do a dry_run
pass
else:
+ original = copy(instance._meta.auto_created)
+ instance._meta.auto_created = True
instance.save()
+ instance._meta.auto_created = original
self.after_save_instance(instance, using_transactions, dry_run) |
|
Thanks Alex - will be interested to see if you can add to this in future, but will close for now. |
Problem
I believe the changes below address and fix the following:
#1099
#1447
#1078
and our similar problem we're facing internally (at work) which is exactly the same as #1078 (comment)
Solution
Instead of creating a
savepointbefore inserting rows and then rolling back to saidsavepoint(if applicable), I'm proposing to implicitly mark the outermost transaction to be rolled back ifdry_run=True and use_transactions=True. This should mark related functions awaiting executionon_committo be considered "rolled back".I'm not a 100% sure I fully understand why this happens, but during my investigation of this issue I've found that the manually created savepoint didn't exist in
connection.savepoint_ids, but rolling back to it was successful, while another savepoint was added toconnection.savepoint_ids(these were the auxiliary actions, like signal handlers, etc), but there was no related savepoint in the database. I suspect it might be somewhat due to threading, but I'm not experienced enough to be able to tell confidently if that's the case.Important! While I think this is more aligned with what end-users would expect to happen when rolling back, and existing tests still pass, this might be considered as a breaking change. Should this be based to the v3-release branch instead?
Acceptance Criteria
Have you written tests? ✅
Have you included screenshots of your changes if applicable? N/A
Did you document your changes? I sure hope so!
Thanks for the review in advance!