BuSLR knows about various packages that are useful for machine learning (originally speech and language processing), and how to go get them and build them. BuSLR supports two build systems:
- One is built around cmake's ExternalProject package. It uses make dependencies to handle package dependencies.
- The other is built around conda. It functions as a
repository for conda's
meta.yamlandbuild.shmetadata.
Not all packages are supported in both systems.
There is an aspect of balance between the wider conda
infrastructure and the linux distribution packaging systems. In general, if
something is normally in a linux distribution (e.g.,
sox) then there's no point handling it here. If
it's in conda then the same argument applies, but more subjectively.
pytorch is better in conda,
kaldi perhaps not.
Also, with conda, bear in mind that this is not a joke; the thing marked "another PIP?" does exist.
Clone the repo and do
cd buslr/local
cp Configure.example configure.sh # Edit if necessary
./configure.sh
make <package name>
The package is built in local and installed to local unless the appropriate
line in configure.sh is changed. You can set:
export PATH=<path-to-buslr>/local/bin
to access the builds, or do source <path-to-buslr>/local/etc/buslrvars.sh to set other appropriate variables too. Set the INHIBIT line in configure.sh to inhibit building of packages for which you might have a system version (typically cuda or mkl).
Clone the repo and do
cd buslr
conda build src/<package name>
As long as the conda-bld directory is on your channel list (it is indexed and
functions as a local channel), you can do this:
conda install <package name>
conda build purge
Many of the packages were initialised with this command
conda-skeleton pypi <name-of-pip-package>
It allows conda versions of PIP packages to be built, thus avoiding the problem with muliple PIPs and conda being unaware of PIP.
- HTS requires the HTK sources to be downloaded manually.
- SRILM also requires a manual download
- Some packages (festival, kaldi, SRILM) don't really support a
make install. See the in-place build section below.
There is a directory for each package. Typically there are only
CMakeLists.txt and meta.yaml files, but there can also be patched or whole
files to be copied into the tree. In the case of HTS and SRILM, the manually
downloaded files are placed there too.
Following the man page for patch, patches can be generated by copying the
original file to <path-to>/<file>.org, modifying the file, then running
diff -Naur <path-to>/<old-file> <path-to>/<new-file>
This is typically run relative to a directory called
package/package-prefix/src/package. At patch time, cmake will cd to that
directory. The patch can be applied using
PATCH_COMMAND patch -p0 < ${CMAKE_CURRENT_SOURCE_DIR}/patch.txt
in the CMakeLists.txt file. A precedent for this is the sctk
package, which patches the installation directory of a deep makefile.
If there are multiple patched files, it's better to run it on a copy of the
whole directory. In this case, it will prepend a directory so we need patch -p1.
If the package is git based then git can generate the patch using git diff.
It functions like the directory case, so patch -p1. However patching git
checkouts causes problems on updates; see irstlm.
Where a package doesn't even have a build system, a cmake file can be copied
directly into the tree. This approach is taken in sph2pipe.
Some packages don't have an install step. The native CMake install can work
well in these cases. CMake's install() command actually writes things to a
file called cmake_install.cmake. The trick is to use this file as the
INSTALL_COMMAND for these cases. The simplest precedent is libresample.
So, define this:
set(CMAKE_INSTALL_SCRIPT ${CMAKE_CURRENT_BINARY_DIR}/cmake_install.cmake)
add this command
INSTALL_COMMAND ${CMAKE_COMMAND} -P ${CMAKE_INSTALL_SCRIPT}
and specify the files using install(FILES <files> DESTINATION <where>).
Some packages, notably kaldi and the festvox family, don't really support being installed. For these, we set SOURCE_DIR to something at top level (rather than buried in the src tree) and set INSTALL_COMMAND true to suppress installation. true here is the unix command that returns 1; empty strings don't survive the BuSLR_Add wrapper.