Skip to content

Improve R Package Dependency Resolution and Script Execution Time#99

Merged
seantleonard merged 16 commits intomasterfrom
seleonar/slowExecution
Mar 21, 2022
Merged

Improve R Package Dependency Resolution and Script Execution Time#99
seantleonard merged 16 commits intomasterfrom
seleonar/slowExecution

Conversation

@seantleonard
Copy link
Copy Markdown
Contributor

@seantleonard seantleonard commented Feb 25, 2022

Why is this change being made?

Mentioned in #95, script execution time can be slow. For R packages, the utility included LinkingTo packages when resolving dependencies for binary packages. Consequentially, LinkingTo packages (i.e. BH and Rcpp where many header files exist) can have large files counts, which contribute to longer execution times. Longer because there many more files which need permissions applied when SQL Server uses the launchpad service to instantiate an external script session contained in an AppContainer.

What does this change do?

When the utility calculates package dependencies, it gets type-specific URL paths (binary and source) to the configured CRAN repos and compiles a list of available packages to install. The available binary and source package lists are combined into one list and are now joined such that the binary package is kept when a package exists as an entry in both the binary and source lists.

  • This is achieved by making binaryPackages the first argument in rbind(binaryPackages, sourcePackages) so that when the duplicated() function runs on the combined list, the first instance of a duplicate (top to bottom) is removed.

The above change is sufficient in isolation if a user only asks for one package to be installed via sql_install.packages(). Now, the utility will iterate over each package requested and properly resolve dependencies (whether to include LinkingTo package dependencies) based on whether the desired package is available as binary, or only as source.

The function tools::package_dependencies() reference argument which includes the LinkingTo packages in the dependency calculation by default (RDocumentation). The utility now determines whether a package is available as source or binary, and populates the which argument accordingly.

  • Binary: which = c("Depends", "Imports")
  • Source: which = c("Depends", "Imports", "LinkingTo")

How is this change tested?

Two tests are added to validate only the appropriate packages are installed.

  1. Binary Package install with LinkingTo dependency ensures no LinkingTo packages are installed if the package has an available binary.
  2. Source Package install with LinkingTo dependency ensures the LinkingTo package dependencies are resolved and are made available for reference when a package is built from source.

Copy link
Copy Markdown

@Aniruddh25 Aniruddh25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Aniruddh25
Copy link
Copy Markdown

With this fix, the unneeded packages will no longer be installed so the client scripts which might be depending on those packages might start to fail without them realizing it. So, I think we should update the sqlmlutils minor version at least and document the new behavior when we release it with this fix.

Copy link
Copy Markdown

@GarrettBeatty GarrettBeatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests verifies that if there are packages with same name that it chooses binary?

@seantleonard
Copy link
Copy Markdown
Contributor Author

seantleonard commented Mar 7, 2022

Which tests verifies that if there are packages with same name that it chooses binary?

The test Binary Package install with LinkingTo dependency tests this end to end. I've added a clarifying comment to the test case to describe how this test case is fulfilled.

'iptools' is available as source and binary. This test validates that the LinkingTo package 'BH' is not installed.
If 'BH' is installed, that means that the 'iptools' source package was chosen,
because LinkingTo packages are required when building from source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants