tests added

gpertea · gpertea · commit 2b5074aca5a5 · 2021-11-05T16:01:27.000-04:00
diff --git a/README.md b/README.md
@@ -210,9 +210,20 @@ as the 2nd input file for the `--mix` option).
 
 As explained above, the alignments must be sorted by coordinate before they can be used as input for StringTie.
 
-Optionally, a reference annotation file in GTF or GFF3 format can be provided to StringTie 
-using the `-G` option which can be used as 'guides' for the assembly process, or their expression levels
-can be directly estimated (without any assembly) when the `-e` option is given.
+When CRAM files are used as input, the original reference genomic sequence can be provided with the `--ref` option as
+a multi-FASTA file with the same chromosome sequences that were used when aligning the reads. The use of `--ref` option is 
+optional but recommended as StringTie can make use of some alignment quality data (mismatches) that may only be retrieved
+in the case of CRAM files when the reference genome sequence is also provided. In particular it is the assessment of junctions
+and their quality that may be slightly affected by omitting the `--ref` option.
+
+### Reference transcripts (guides)
+
+A reference annotation file in GTF or GFF3 format can be provided to StringTie 
+using the `-G` option which can be used as 'guides' for the assembly process. 
+
+When the `-e` option is used (i.e. expression estimation only), this option is required, 
+and in that case StringTie will not attempt to assemble the read alignments but instead it will 
+only estimate the expression levels of all the transcripts provided in this file
 
 Note that the reference transcripts should be fully covered by reads in order to be included
 in StringTie's output with the original ID of the reference transcript shown in the 
diff --git a/run_tests.sh b/run_tests.sh
@@ -1,34 +1,34 @@
 #!/usr/bin/env bash
 
 function unpack_test_data() {
-  t=test_data.tar.gz
+  t=tests.tar.gz
   if [ ! -f $t ]; then
     echo "Error: file $t not found!"
     exit 1
   fi
   echo "..unpacking test data.."
   echo
   tar -xzf $t
-  if [ ! -f test_data/human-chr19_P.gff ]; then
+  if [ ! -f tests/human-chr19_P.gff ]; then
      echo "Error: invalid test data archive?"
      exit 1
   fi
-  /bin/rm -f test_data.tar.gz
+  /bin/rm -f tests.tar.gz
 }
 
-#if [ ! -f test_data/human-chr19_P.gff ]; then
-  if [ -f test_data.tar.gz ]; then
+#if [ ! -f tests/human-chr19_P.gff ]; then
+  if [ -f tests.tar.gz ]; then
     #extract the tarball and rename the directory
-    echo "..Using existing ./test_data.tar.gz"
+    echo "..Using existing ./tests.tar.gz"
     unpack_test_data
   else
     echo "..Downloading test data.."
     #use curl to fetch the tarball from a specific github release or branch
-    curl -sLO https://github.com/gpertea/stringtie/raw/test_data/test_data.tar.gz
+    curl -sLO https://github.com/gpertea/stringtie/raw/test_data/tests.tar.gz
     unpack_test_data
   fi
 # fi
-cd test_data
+cd tests
 # array element format:
 # 
 arrins=("short_reads" "short_reads_and_superreads" "long_reads" "long_reads" \
diff --git a/run_tests_valgrind.sh b/run_tests_valgrind.sh
@@ -1,30 +1,30 @@
 #!/usr/bin/env bash
 
 function unpack_test_data() {
-  t=test_data.tar.gz
+  t=tests.tar.gz
   if [ ! -f $t ]; then
     echo "Error: file $t not found!"
     exit 1
   fi
   echo "..unpacking test data.."
   echo
   tar -xzf $t
-  if [ ! -f test_data/human-chr19_P.gff ]; then
+  if [ ! -f tests/human-chr19_P.gff ]; then
      echo "Error: invalid test data archive?"
      exit 1
   fi
-  #/bin/rm -f test_data.tar.gz
+  #/bin/rm -f tests.tar.gz
 }
 
-#if [ ! -f test_data/human-chr19_P.gff ]; then
-  if [ -d ./test_data ]; then
+#if [ ! -f tests/human-chr19_P.gff ]; then
+  if [ -d ./tests ]; then
     #extract the tarball and rename the directory
-    echo "..Using existing ./test_data"
+    echo "..Using existing ./tests"
     unpack_test_data
   else
     echo "..Downloading test data.."
     #use curl to fetch the tarball from a specific github release or branch
-    curl -sLO https://github.com/gpertea/stringtie/raw/test_data/test_data.tar.gz
+    curl -sLO https://github.com/gpertea/stringtie/raw/test_data/tests.tar.gz
     unpack_test_data
   fi
 # fi
diff --git a/tests/README.md b/tests/README.md
@@ -2,7 +2,7 @@
 
 The test data can be automatically retrieved by the `run_tests.sh` script included 
 with all source or binary distributions of StringTie, or downloaded separately from this url:
-https://github.com/gpertea/stringtie/raw/test_data/test_data.tar.gz
+https://github.com/gpertea/stringtie/raw/test_data/tests.tar.gz
 
 The `run_tests.sh` script will then run StringTie on these data sets and compare the output with the 
 precomputed, expected output for each case. If the output of each test matches the