Skip to content

Extends scope of transactions to rollback when using dry_run=True#1481

Closed
duck-nukem wants to merge 1 commit intodjango-import-export:release-3-xfrom
duck-nukem:main
Closed

Extends scope of transactions to rollback when using dry_run=True#1481
duck-nukem wants to merge 1 commit intodjango-import-export:release-3-xfrom
duck-nukem:main

Conversation

@duck-nukem
Copy link
Copy Markdown

@duck-nukem duck-nukem commented Aug 5, 2022

Problem

I believe the changes below address and fix the following:
#1099
#1447
#1078

and our similar problem we're facing internally (at work) which is exactly the same as #1078 (comment)

Solution

Instead of creating a savepoint before inserting rows and then rolling back to said savepoint (if applicable), I'm proposing to implicitly mark the outermost transaction to be rolled back if dry_run=True and use_transactions=True. This should mark related functions awaiting execution on_commit to be considered "rolled back".

I'm not a 100% sure I fully understand why this happens, but during my investigation of this issue I've found that the manually created savepoint didn't exist in connection.savepoint_ids, but rolling back to it was successful, while another savepoint was added to connection.savepoint_ids (these were the auxiliary actions, like signal handlers, etc), but there was no related savepoint in the database. I suspect it might be somewhat due to threading, but I'm not experienced enough to be able to tell confidently if that's the case.

Important! While I think this is more aligned with what end-users would expect to happen when rolling back, and existing tests still pass, this might be considered as a breaking change. Should this be based to the v3-release branch instead?

Acceptance Criteria

Have you written tests? ✅
Have you included screenshots of your changes if applicable? N/A
Did you document your changes? I sure hope so!

Thanks for the review in advance!

@duck-nukem duck-nukem changed the title Adds a wider effect to transaction rollbacks Extends scope of transactions to rollback when using dry_run=True Aug 7, 2022
@matthewhegarty matthewhegarty requested a review from rpsands August 13, 2022 16:50
@matthewhegarty
Copy link
Copy Markdown
Contributor

@pokken-magic would appreciate your input as you raised #1447

@matthewhegarty
Copy link
Copy Markdown
Contributor

Thanks for submitting this PR - I'll review in detail when I get a chance.

Should this be based to the v3-release branch instead?

Yes I think that if this goes in, it would be safer to go into v3

@rpsands
Copy link
Copy Markdown

rpsands commented Aug 13, 2022

Happy to take a look at this one when it's based to 3 and passing tests. Looks very very promising though! Thanks for doing it.

@duck-nukem duck-nukem changed the base branch from main to release-3-x August 13, 2022 18:03
@duck-nukem
Copy link
Copy Markdown
Author

  • Fixed imports formatting
  • PR now points to release-3-x
  • Updated my fork's main to be release-3-x so we don't have to worry about merge conflicts in this PR

I think the PR is ready for another round of CI, fingers crossed all tests pass 🤞🏼

@matthewhegarty
Copy link
Copy Markdown
Contributor

I think we'll need to take a step back and try to understand exactly what's going on before we can merge.

From what I have read from previous issues, the purpose of this fix is to stop the post_save() signal from being called twice. Is that correct?

If you have any more insight into how this works, or how exactly it fixes the issue I would be interested to understand in more detail.

For reference, this is the implementation of django.db.transaction.set_rollback()

def set_rollback(rollback, using=None):
    """
    Set or unset the "needs rollback" flag -- for *advanced use* only.

    When `rollback` is `True`, trigger a rollback when exiting the innermost
    enclosing atomic block that has `savepoint=True` (that's the default). Use
    this to force a rollback without raising an exception.

    When `rollback` is `False`, prevent such a rollback. Use this only after
    rolling back to a known-good state! Otherwise, you break the atomic block
    and data corruption may occur.
    """
    return get_connection(using).set_rollback(rollback)

# Ensure the rollback has worked properly, no instances were created.
self.assertFalse(Author.objects.exists())

def test_rollback_auxiliary_on_commit_transactions_when_dry_run(self):
Copy link
Copy Markdown
Contributor

@matthewhegarty matthewhegarty Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need an additional test to ensure that rollback is not called when using_transactions is False. Likewise we need to test get_connection() called for all these conditions:

            if dry_run or \
                    result.has_errors() or \
                    (rollback_on_validation_errors and result.has_validation_errors()):
                get_connection(using=db_connection).set_rollback(True)

@duck-nukem
Copy link
Copy Markdown
Author

duck-nukem commented Aug 15, 2022

From what I have read from previous issues, the purpose of this fix is to stop the post_save() signal from being called twice. Is that correct?

I think that's correct

If you have any more insight into how this works, or how exactly it fixes the issue I would be interested to understand in more detail.

What I think is happening is that the savepoints in the current implementation limit the scope of the rollback to the inserts made during the import preview: we check if the to-be-imported rows could be saved to the DB to ensure there are no integration errors etc, then we only rollback those inserts. In the meanwhile signals have already been dispatched for related (background) tasks, but they are exempt from the rollbacks, so they'll be executed as if we had actually imported any data.

@matthewhegarty
Copy link
Copy Markdown
Contributor

I added the following to tests/core/admin.py:

@receiver(post_save, sender=Author)
def post_save_callback(sender, **kwargs):
    print("Save called")

I then imported the attached Author via the Example App. With this fix we would expect the call back to be called once, in fact it gets called twice. It looks like the change prevents the new row from being written to the db before the "confirm" step, but the signal gets sent twice. Can you reproduce? Have I missed something here?

author.csv

@matthewhegarty
Copy link
Copy Markdown
Contributor

AFAIK, setting a flag on the instance would be the only way to prevent the post_save signal being called twice.

@rpsands
Copy link
Copy Markdown

rpsands commented Aug 17, 2022

Fwiw I am not sure this solves 1447 because it's a peculiarity of file fields not directly connected to the save handlers (I don't think anyway). But it seems like a good idea if it can be made to work. Nothing else to add beyond Matthew's findings.

Because the issue w 1447 is ultimately due to serializing files, which our library doesn't support, it's possible I should just close that one tbh.

@duck-nukem
Copy link
Copy Markdown
Author

duck-nukem commented Aug 20, 2022

@matthewhegarty I'm starting to think maybe this "fix" was too specific for our use-case as we usually expect signals to be called on transaction.on_commit

For example:

@receiver(post_save, sender=Author)
def post_save_callback(sender, **kwargs):
-    print("Save called")
+    transaction.on_commit(lambda: print("Save called"))

That's very specific and I guess that'd be too much to impose on the users of the library.

I'll try to spend some time on this (thanks for pointing me to author.csv + the test django instance), but unfortunately covid is kicking my butt atm, so unsure when will I be able to pick this up again. In the meantime feel free to close this PR and I'll come back with another one if I can find a better solution.

Just FYI I'm thinking about either temporarily disabling all signals via a context manager in save_instance, or setting instance._meta.auto_created to True temporarily.

django's save_base method line 865
django's save_base method line 896

This from a quick prototyping seems to have no major side-effects, but it feels a bit hack-y

    def save_instance(self, instance, is_create, using_transactions=True, dry_run=False):
        """
        Takes care of saving the object to the database.

        Objects can be created in bulk if ``use_bulk`` is enabled.

        :param instance: The instance of the object to be persisted.
        :param is_create: A boolean flag to indicate whether this is a new object
        to be created, or an existing object to be updated.
        :param using_transactions: A flag to indicate whether db transactions are used.
        :param dry_run: A flag to indicate dry-run mode.
        """
        self.before_save_instance(instance, using_transactions, dry_run)
        if self._meta.use_bulk:
            if is_create:
                self.create_instances.append(instance)
            else:
                self.update_instances.append(instance)
        else:
            if not using_transactions and dry_run:
                # we don't have transactions and we want to do a dry_run
                pass
            else:
+               original = copy(instance._meta.auto_created)
+               instance._meta.auto_created = True
                instance.save()
+               instance._meta.auto_created = original
        self.after_save_instance(instance, using_transactions, dry_run)

@matthewhegarty
Copy link
Copy Markdown
Contributor

Thanks Alex - will be interested to see if you can add to this in future, but will close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants