Skip to content

Add DeBERTa-V2 submission — ConPara + all RAID attacks#108

Merged
liamdugan merged 3 commits intoliamdugan:mainfrom
MohamedMady19:main
Apr 16, 2026
Merged

Add DeBERTa-V2 submission — ConPara + all RAID attacks#108
liamdugan merged 3 commits intoliamdugan:mainfrom
MohamedMady19:main

Conversation

@MohamedMady19
Copy link
Copy Markdown

DeBERTa-ConPara-v2 Submission

Model: DeBERTa-v3-base (184M) + 30 MI-selected linguistic features
Training data: HC3Plus + M4 + MAGE + RAID none + all 11 RAID attack types (~767K samples, 50/50 balanced)
Key improvement over v1: Trained on all 11 adversarial attack types (paraphrase, synonym, homoglyph, whitespace, etc.) with balanced per-attack × per-domain sampling
Threshold: 0.80 (val balanced accuracy: 92.73%)

Submission files: predictions.json + metadata.json

@github-actions
Copy link
Copy Markdown

It looks like this eval run failed. Please check the workflow logs to see what went wrong, then push a new commit to your PR to rerun the eval.

@liamdugan
Copy link
Copy Markdown
Owner

Hey @MohamedMady19 , looks like your metadata file has a few entries that the bot doesn't expect. Can you edit the metadata file to match exactly the fields in the template_metadata.json file here?

@github-actions
Copy link
Copy Markdown

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

DeBERTa-ConPara-v2

Release date: 2026-04-15

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 97.38 and a TPR of 97.11% at FPR=5% and 93.34% at FPR=1%.
Without adversarial attacks, it achieved AUROC of 97.33 and a TPR of 96.92% at FPR=5% and 93.26% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

@MohamedMady19
Copy link
Copy Markdown
Author

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

DeBERTa-ConPara-v2

Release date: 2026-04-15

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved an AUROC of 97.38 and a TPR of 97.11% at FPR=5% and 93.34% at FPR=1%. Without adversarial attacks, it achieved AUROC of 97.33 and a TPR of 96.92% at FPR=5% and 93.26% at FPR=1%.

If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

Thank you, Yes please feel free to merge it.

@liamdugan liamdugan merged commit 8f8ed29 into liamdugan:main Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants