Skip to content

notebook image cleaning automation#159212

Merged
rebornix merged 44 commits intomainfrom
mlively/cleaner
Sep 9, 2022
Merged

notebook image cleaning automation#159212
rebornix merged 44 commits intomainfrom
mlively/cleaner

Conversation

@Yoyokrazy
Copy link
Collaborator

Notebook will automatically remove un-used image attachments from a cell's metadata, storing them in a cache (to allow for user to move text around and not lose image data).

Cache is structured with entries for each notebook uri, and then each cell fragment under those. top level uri entries will be deleted when notebook editors are closed. The entire cache is disposed of upon the extension closing with the workspace.

@Yoyokrazy Yoyokrazy self-assigned this Aug 25, 2022
@Yoyokrazy Yoyokrazy marked this pull request as ready for review August 30, 2022 17:46
@Yoyokrazy Yoyokrazy requested a review from mjbvz August 30, 2022 17:46
@vscodenpa vscodenpa added this to the September 2022 milestone Aug 30, 2022
Copy link
Collaborator

@mjbvz mjbvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted a few suggestions

I'm having a hard time understanding what the code is doing. I think encapsulating the logic inside classes and functions would help make this more clear

);

context.subscriptions.push(vscode.workspace.onDidChangeNotebookDocument(e => {
delayTrigger.trigger(e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are editing quickly, this may end up firing after onDidCloseNotebookDocument or onDidRenameFiles. Is that case properly handled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likely it won't be, I'm not sure exactly how I would go about handling that. Do you have any initial thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea: instead of passing e along in a closure, add it to a list. When the delay trigger is finally invoked, it should loop through items in the list. If close is called before that though, remove the entry from the list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely understanding that as a solution, as when onDidCloseNotebookDocument is invoked, the notebook will already be closed and if a workspace edit is applied, the notebook would reopen dirty. I think this would work/there would be plenty of other solutions if an onWillClose listener was possible, but don't think there's any support for that at the moment.

Current fix is just to check if the notebook is closed at the very beginning of the clean function. It means that if someone is editing quickly they might not have their notebook cleaned, but not sure there is a way to hook in right before the notebook is closed.

@Yoyokrazy Yoyokrazy requested a review from amunger September 1, 2022 18:32
* Licensed under the MIT License. See License.txt in the project root for license information.
*--------------------------------------------------------------------------------------------*/

export function deepClone<T>(obj: T): T {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have more generally available versions of these functions? (and better tested?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amunger currently we don't, I borrowed this from the core.

@rebornix
Copy link
Member

rebornix commented Sep 8, 2022

@mjbvz I picked up this PR since @Yoyokrazy is out. By reading through the suggestions above, I have made following changes:

  • Diagnostic provider for missing attachments in markdown cells
    • Quick fix for removing invalid attachment references in markdown cell
  • Proper debouncer which captures markdown cells that are modified with delay. At the end of the delay, analyze all cells being captured for unused attachments
  • On extension activation, scan open notebook cell text documents, validate missing attachments
  • On notebook cell content change, capture the cell and request validation for missing and unused attachments
  • On file willRename, transfer the cache to new resource
  • On notebook document close, clear the cache and diagnostics

Along with changes in the clean up tool, also made required changes to the notebook diff editor to ensure the attachment metadata is presented in the diff editor.

I didn't try to tackle the race condition between notebook document change and document close. It might require introducing code action for notebook document save (which only exists in text document) as current approach of attachment cleanup is responding to cell document changed event, but not as a pre-change hook. We can discuss what's the best approach to tackle this. @mjbvz it would be great if you can take another look at this PR.

@rebornix rebornix requested a review from mjbvz September 8, 2022 17:58
@rebornix rebornix assigned rebornix and unassigned Yoyokrazy Sep 8, 2022
}
}
],
"configuration": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space to tab

const vscodeDiagnostics: vscode.Diagnostic[] = [];
for (const currDiagnostic of diagnostics) {
currDiagnostic.ranges.forEach(range => {
vscodeDiagnostics.push(new vscode.Diagnostic(range, `Attachment ${currDiagnostic.name} not available`, vscode.DiagnosticSeverity.Warning));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't set up nls yet in code.

@rebornix
Copy link
Member

rebornix commented Sep 9, 2022

@mjbvz thanks for the review, updated the code per your comments ;)

@rebornix rebornix requested a review from mjbvz September 9, 2022 03:14
},
{
"command": "ipynb.cleanInvalidImageAttachment",
"title": "Clean invalid image attachment reference"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the word "reference" here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using reference for the image attachment link in the cell document, and use attachment for the data stored in the metadata.

const fixes: vscode.CodeAction[] = [];

for (const diagnostic of context.diagnostics) {
switch (diagnostic.code) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this loop just looking to see whether a single diagnostic of type missing_attachment exists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be multiple diagnostics in the context (e.g., referencing the same invalid attachment multiple times).

@rebornix rebornix merged commit 42238bd into main Sep 9, 2022
@rebornix rebornix deleted the mlively/cleaner branch September 9, 2022 18:32
@github-actions github-actions bot locked and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants