Improve R Package Dependency Resolution and Script Execution Time#99
Improve R Package Dependency Resolution and Script Execution Time#99seantleonard merged 16 commits intomasterfrom
Conversation
… pkgs. Dependency Calculation doesn't fetch linkingTo packages for binary pkgs.
… check. Added tests for LinkingTo package inclusion for binary vs source.
|
With this fix, the unneeded packages will no longer be installed so the client scripts which might be depending on those packages might start to fail without them realizing it. So, I think we should update the sqlmlutils minor version at least and document the new behavior when we release it with this fix. |
GarrettBeatty
left a comment
There was a problem hiding this comment.
Which tests verifies that if there are packages with same name that it chooses binary?
The test
|
Why is this change being made?
Mentioned in #95, script execution time can be slow. For R packages, the utility included LinkingTo packages when resolving dependencies for binary packages. Consequentially, LinkingTo packages (i.e. BH and Rcpp where many header files exist) can have large files counts, which contribute to longer execution times. Longer because there many more files which need permissions applied when SQL Server uses the launchpad service to instantiate an external script session contained in an AppContainer.
What does this change do?
When the utility calculates package dependencies, it gets type-specific URL paths (binary and source) to the configured CRAN repos and compiles a list of available packages to install. The available binary and source package lists are combined into one list and are now joined such that the binary package is kept when a package exists as an entry in both the binary and source lists.
binaryPackagesthe first argument inrbind(binaryPackages, sourcePackages)so that when theduplicated()function runs on the combined list, the first instance of a duplicate (top to bottom) is removed.The above change is sufficient in isolation if a user only asks for one package to be installed via
sql_install.packages(). Now, the utility will iterate over each package requested and properly resolve dependencies (whether to include LinkingTo package dependencies) based on whether the desired package is available as binary, or only as source.The function
tools::package_dependencies()reference argumentwhichincludes the LinkingTo packages in the dependency calculation by default (RDocumentation). The utility now determines whether a package is available as source or binary, and populates thewhichargument accordingly.which = c("Depends", "Imports")which = c("Depends", "Imports", "LinkingTo")How is this change tested?
Two tests are added to validate only the appropriate packages are installed.