Adding ridge regularization in Nonnegative Parafac HALS, with some needed documentation#603
Adding ridge regularization in Nonnegative Parafac HALS, with some needed documentation#603cohenjer wants to merge 18 commits intotensorly:mainfrom
Conversation
add callback for returning loss update nnls_hals to be documented
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #603 +/- ##
==========================================
+ Coverage 88.07% 88.17% +0.10%
==========================================
Files 132 132
Lines 7915 7985 +70
==========================================
+ Hits 6971 7041 +70
Misses 944 944 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Any comments @aarmey @JeanKossaifi? |
|
Thanks @aarmey. I fixed the black issues, I had not updated black recently enough! |
JeanKossaifi
left a comment
There was a problem hiding this comment.
Thank you @cohenjer ! Looks great, it's super useful to have the ridge regularization option. I left some comments - mostly looking at minimizing code complexity where possible.
| loss += ridge_coefficients[mode] ** 2 * tl.norm( | ||
| factor | ||
| ) ** 2 + sparsity_coefficients[mode] * tl.sum(tl.abs(factor)) | ||
| callback(cp_tensor, rec_error, loss) |
There was a problem hiding this comment.
This is very specific computation just to pass to a callback no? What are the actual cases? I'd almost just have a conditional here if needed instead.
| if verbose: | ||
| print("Received True from callback function. Exiting.") | ||
| break | ||
| if tol: |
There was a problem hiding this comment.
Same here, we are adding a lot of complexity compared to the previous one that was just
if tol:
factors_norm = cp_norm((weights, factors))
iprod = tl.sum(tl.sum(mttkrp * factors[-1], axis=0))
rec_error = (
tl.sqrt(tl.abs(norm_tensor**2 + factors_norm**2 - 2 * iprod))
/ norm_tensor
)I'm not sure that extra complexity is needed, especially as it will make the code harder for others to read/maintain.
There was a problem hiding this comment.
Another comment is that this conflates the objective and the reconstruction error - to me the reconstruction error should not include the ridge regularization, unlike the objective function.
There was a problem hiding this comment.
Hi Jean,
To answer both your comments, indeed I was a bit conflicted here, because the previous code only returned the reconstruction error, but the algorithm is designed to minimize the objective function (including regularization). I did not want to break previous behavior and therefore still return the reconstruction error, but also return the loss (objective function) which I find also quite useful.
I guess we can also just return the factors and mttkrp and not the reconstruction error nor the loss, and let the user compute them manually if he wants to return the loss/rec error. This way in case a callback is used, the runtime would not necessarily be slower. The code would also be much simpler. We also already have error calc function knowing mttkrp somewhere in tensorly, so that would not be hard to use.
I could add the code for callback loss function in the documentation/examples.
To summarize:
- Either we simplify the callback by just passing factors and mttkrp. It makes it slightly harder for the user to return rec error or objective function, but we can provide an example.
- Or we pass the error and/or objective function as a callback argument, as I did, in which case we could, as you suggest, return only the rec_error. The code would be very similary to the previous version in that case.
There was a problem hiding this comment.
Another comment is that this conflates the objective and the reconstruction error - to me the reconstruction error should not include the ridge regularization, unlike the objective function.
This is actually exactly what I did I think!
|
@JeanKossaifi I followed your recommendation and simplified all the callback calls to simply use the unnormalized reconstruction error in the signature, along with the CP tensor. This way the user can easily compute the loss function, and I believe it is consistent with the CP decomposition loss. |
This PR adds the option to impose ridge regularization in nonnegative CP decomposition with the HALS algorithm. It uses the ridge coefficient already implemented in the HALS solver. Ridge regularization is very useful because it allows to ask for sparsity on a selected number of modes using l1 regularization and applying l2 regularization elsewhere. It also allows to prune spurious components.
The PR also adds a callback interface (without removing the return error API) to be able to track the loss function (which should account for regularization including sparsity). The documentation of the function has also been improve to show the exact cost function minimized by the HALS algorithm.
This has been tested locally, I added a unit test for ridge and sparse regularized nCPD.
I also corrected a typo from an earlier PR at line 320 of _nn_cp.py where nn_modes should have been used instead of n_modes (sparsity is not implemented for unconstrained modes)