eval

Guidelines for Writing Evaluation Scripts

A typical evaluation script consists of four parts:

Read parameters, including the name of the model to be evaluated and the environment used for evaluation.
Set environment variables, for example the path where images are stored, the path for model weights used during evaluation, and activate Conda or similar environments.
Change the working directory to the root directory of the evaluation repository.
Execute the evaluation command.

For example, the following script (genai.sh):

# Read parameters
MODEL_NAME=$1
EVAL_ENV=$2

# Set environment variables
RESULT_DIR="$PWD/output/${MODEL_NAME}"
VISION_TOWER="$PWD/eval_models/clip-vit-large-patch14-336"
T5_PATH="$PWD/eval_models/clip-flant5-xxl"
source utils/use_cuda.sh 11.8

# Change working directory
genai_dir="$PWD/benchmarks/genai"
echo "genai directory: $genai_dir"
cd $genai_dir || exit

# Execute the evaluation command
META_DIR="eval_prompts/genai1600"
IMAGE_DIR="${RESULT_DIR}/genai/images"
VISION_TOWER=${VISION_TOWER} $CONDA_BASE/envs/yph-genai/bin/python -m step2_run_model \
    --model_path ${T5_PATH} \
    --image_dir ${IMAGE_DIR} \
    --meta_dir ${META_DIR} > ${RESULT_DIR}/genai/results.txt
cat ${RESULT_DIR}/genai/results.txt

You may of course modify the structure of the evaluation script as needed, provided the changes are reasonable.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
dpgbench.sh		dpgbench.sh
genai.sh		genai.sh
geneval.sh		geneval.sh
geneval2.sh		geneval2.sh
genexam.sh		genexam.sh
hpsv2.sh		hpsv2.sh
imgedit.sh		imgedit.sh
krisbench.sh		krisbench.sh
longtext.sh		longtext.sh
oneig.sh		oneig.sh
rise.sh		rise.sh
t2ireasonbench.sh		t2ireasonbench.sh
tiff.sh		tiff.sh
unigenbench.sh		unigenbench.sh
wise.sh		wise.sh
wiseedit.sh		wiseedit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Guidelines for Writing Evaluation Scripts

FilesExpand file tree

eval

Directory actions

More options

Directory actions

More options

Latest commit

History

eval

Folders and files

parent directory

README.md

Guidelines for Writing Evaluation Scripts