Hi @junminghuang
If we don't get feedback, then we can assign somebody else when I am back.
I will be adding you in my PTO coverage issue just in case. Thanks!
Hi @tle_gitlab, feel free to use duo_chat.security_analyst.1 to test this pipeline, it contains 200 examples and I think we have solved the problem of LLM lack of context.
Do not hesitate to use it, curious to know your thoughts
Note: Unfortunately there is not way to fetch the dataset size for a certain dataset split via API call.
Before, I was using read_dataset, but this function does not support passing the dataset split name, which means that it will always return the full dataset size.
When working with splits, this is not useful anymore, so I had to do the counting myself.
Hi @tle_gitlab, I just updated one last time the MR to compute the dataset_size considering the dataset split. Before it was always returning the whole dataset size, which is incorrect. I think now it is ready for review
Fabrizio J. Piva (4b8394c6) at 13 Mar 10:45
Update dataset_size attribute computation to be compatible with splits
Thanks @tle_gitlab! I will mark this as solved then
Hi @erran
Fabrizio J. Piva (d568ddff) at 11 Mar 17:19
Fix linter
Fabrizio J. Piva (d6cd63c6) at 11 Mar 17:19
Add support for dataset split specification
Done, thank you
Fabrizio J. Piva (7416693b) at 11 Mar 17:08
Removed id in projects.get
Hi @tle_gitlab
@tle_gitlab do you think this is necessary?
@GitLabDuo the example I gave you is as follows:
file = project.files.get(gt_file_name, ref="main")
file_bytes = file.decode()
ontent = yaml.safe_load(file_bytes.decode("utf-8"))
But that example was processing a yaml file. If we want to get the raw content, we just remove yaml.safe_load() wrapping and we would get the following:
file = project.files.get(gt_file_name, ref="main")
file_bytes = file.decode()
content = file_bytes.decode("utf-8")
@GitLabDuo please check this example: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/cef/security_testing/data/extract.py#L282
I think what I did is correct
Thanks for the feedback, solved
Fabrizio J. Piva (edd32868) at 11 Mar 16:56
Update check for GitLab Rest client