The evaluation function should be modified to allow a user to pass a list of metrics or metric functions (and metric names) which are evaluated at once. Instead of a single value the function should then return a dictionary of the average metrics scores assigned to the metric names.