Skip to content

fix(array): make docarrays persist atexit#604

Closed
Jackmin801 wants to merge 2 commits intodocarray:mainfrom
Jackmin801:fix-array-529
Closed

fix(array): make docarrays persist atexit#604
Jackmin801 wants to merge 2 commits intodocarray:mainfrom
Jackmin801:fix-array-529

Conversation

@Jackmin801
Copy link
Copy Markdown
Contributor

Goals:

@Jackmin801
Copy link
Copy Markdown
Contributor Author

Of the new tests added, 1 of them is expected to fail.

DocArrays backed by Weaviate seem to have 2 references to them upon creation.
This means they cannot be garbage collected when they go out of scope, possibly creating a memory leak

@Jackmin801 Jackmin801 changed the title Fix array 529 fix(array) make docarrays persist atexit Oct 11, 2022
@Jackmin801 Jackmin801 changed the title fix(array) make docarrays persist atexit fix(array): make docarrays persist atexit Oct 11, 2022
@JoanFM
Copy link
Copy Markdown
Member

JoanFM commented Oct 11, 2022

Hey @Jackmin801 , please solve the commit linting problem to have the tests run

@Jackmin801
Copy link
Copy Markdown
Contributor Author

Ah ok. I purposely failed it cause I wasnt sure if I should be constantly running the CI. Seems like they are a bit expensive

@Jackmin801 Jackmin801 force-pushed the fix-array-529 branch 2 times, most recently from 6d60ce4 to f9fcba4 Compare October 11, 2022 08:05
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 11, 2022

Codecov Report

Merging #604 (f9fcba4) into main (262b2d1) will decrease coverage by 9.18%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main     #604      +/-   ##
==========================================
- Coverage   84.91%   75.72%   -9.19%     
==========================================
  Files         133      133              
  Lines        6714     6723       +9     
==========================================
- Hits         5701     5091     -610     
- Misses       1013     1632     +619     
Flag Coverage Δ
docarray 75.72% <83.33%> (-9.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/storage/elastic/seqlike.py 89.58% <ø> (-4.42%) ⬇️
docarray/array/base.py 73.91% <80.00%> (+4.68%) ⬆️
docarray/array/storage/base/getsetdel.py 65.10% <100.00%> (-26.12%) ⬇️
docarray/math/evaluation.py 0.00% <0.00%> (-82.15%) ⬇️
docarray/array/mixins/embed.py 10.52% <0.00%> (-78.95%) ⬇️
docarray/array/mixins/evaluation.py 19.44% <0.00%> (-69.45%) ⬇️
docarray/array/mixins/io/pushpull.py 24.36% <0.00%> (-68.07%) ⬇️
docarray/array/mixins/io/binary.py 45.22% <0.00%> (-52.23%) ⬇️
docarray/array/mixins/text.py 50.00% <0.00%> (-50.00%) ⬇️
docarray/array/mixins/io/dataframe.py 40.00% <0.00%> (-46.67%) ⬇️
... and 40 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this logic, we will be update offset2id twice if we use the context manager.
What I propose is:

  1. rename _delete to _synced
  2. _synced is initially true
  3. _synced = False on append/extend/update/delete operations
  4. _synced = True when context manager exits
    I also think it's clearer to put just _save_offset2id (or any logic that cannot run during program exit) in the exit hook, not all of del.

@JoanFM what do you think ?

@JoanFM
Copy link
Copy Markdown
Member

JoanFM commented Oct 13, 2022

4. I also think it's clearer to put just _save_offset2id (or any logic that cannot run during program exit) in the exit hook, not all of del.

I like this idea

@Jackmin801
Copy link
Copy Markdown
Contributor Author

Jackmin801 commented Oct 13, 2022

After some discussion internally among the maintainers, it was decided DocArray will only support explicit sync and sync when exiting context manager.

There are four candidate places we can sync:

  1. [self.sync] When the user explicitly calls da.sync()
  2. [self.__exit__] When the user exits the DocArray’s context manager
  3. [self.__del__] When the variable gets garbage collected
  4. [atexit.register] When the program exits

DocArray will only support (1) and (2) -- explicit sync and sync when exiting context manager.

We will not support syncing when the variable gets garbage collected or when the program exits.
Because:

  1. Not supporting them would make it such that we do not need to keep track of a _synced flag. Which simplifies the logic.
  2. They are too implicit and difficult to predict. Using explicit sync and the pythonic context manager are intentional and have easy to understand, predictable behavior. Garbage collection and trying to sync on program exit can have edge cases that can make them tricky for users to use and would breed bad practice in their code.

@Jackmin801
Copy link
Copy Markdown
Contributor Author

We are no longer pursuing adding the behavior of persistence to atexit. The intended persistence behavior will be implemented in #625

@Jackmin801 Jackmin801 closed this Oct 14, 2022
@Jackmin801 Jackmin801 deleted the fix-array-529 branch October 14, 2022 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants