Semgrep is a fast, open source static analysis tool for finding bugs, detecting vulnerabilities. Different from traditional tools using regex to detecting secrets, Semgrep is more powerful. Its rule repository covers different programming languages. You can input the code snippets and scan them.
However based on some easy transformation on the vulnerable snippets, we can evade the analysis. So our work mainly focuses on proposing some kinds of strategies to evade the tool successfully while keeping the code snippets vulnewrable. Note that we choose Python part as our target.
- String Matching (SM)
- Constant Analysis (CA)
- Dataflow Analysis (DA)
Novelly, we introduce LLMs in our work to generate transformed codes in a more efficient way as shown in Fig. 1. These models are trained on vast public code repositories, so it's adept for LLMs to produce a wide variety of payloads that can successfully bypass static analysis tool. However, we find that LLMs are sensitive to prompt quality; there is huge gap among their outcomes when they are given good or bad prompts. Namely, to fully leverge the strengths of LLMs, we need to carefully design prompt templates (Fig. 2).
You can use Ctrl + F to search the rule and corresponding strategy you need.
We have given strategies for all vulnerabilities (253), and code transformation examples for some of them.
| Category | Rule ID | Our Strategies |
|---|---|---|
| cryptography | empty-aes-key | SM |
| cryptography | insecure-cipher-algorithm-arc4 | SM |
| cryptography | insecure-cipher-algorithm-blowfish | SM |
| cryptography | insecure-cipher-algorithm-idea | SM |
| cryptography | insecure-cipher-mode-ecb | SM |
| cryptography | insecure-hash-algorithm-md5 | SM |
| cryptography | insecure-hash-algorithm-sha1 | DA |
| cryptography | insufficient-dsa-key-size | CA |
| cryptography | insufficient-ec-key-size | CA |
| cryptography | insufficient-rsa-key-size | CA |
| cryptography | crypto-mode-without-authentication | SM |
| distributed | require-encryption | CA |
| airflow | formatted-string-bashoperator | DA |
| aws-lambda | dangerous-asyncio-create-exec | DA |
| aws-lambda | dangerous-asyncio-exec | DA |
| aws-lambda | dangerous-asyncio-shell | DA |
| aws-lambda | dangerous-spawn-process | DA |
| aws-lambda | dangerous-subprocess-use | DA |
| aws-lambda | dangerous-system-call | DA |
| aws-lambda | dynamodb-filter-injection | DA |
| aws-lambda | mysql-sqli | DA |
| aws-lambda | psycopg-sqli | DA |
| aws-lambda | pymssql-sqlin | DA |
| aws-lambda | pymysql-sqli | DA |
| aws-lambda | sqlalchemy-sqli | DA |
| aws-lambda | tainted-code-exec | DA |
| aws-lambda | tainted-html-response | DA |
| aws-lambda | tainted-html-string | DA |
| aws-lambda | tainted-pickle-deserialization | DA |
| aws-lambda | tainted-sql-string | DA |
| jinja2 | incorrect-autoescape-disabled | DA |
| jinja2 | missing-autoescape-disabled | SM |
| jwt | jwt-python-exposed-data | DA |
| jwt | jwt-python-exposed-credentials | DA |
| jwt | jwt-python-hardcoded-secret | DA |
| jwt | jwt-python-none-alg | CA |
| jwt | unverified-jwt-decode | SM |
| pycryptodome | insecure-cipher-algorithm-blowfish | SM |
| pycryptodome | insecure-cipher-algorithm-des | SM |
| pycryptodome | insecure-cipher-algorithm-rc2 | SM |
| pycryptodome | insecure-cipher-algorithm-rc4 | SM |
| pycryptodome | insecure-cipher-algorithm-xor | SM |
| pycryptodome | insecure-cipher-algorithm-md2 | SM |
| pycryptodome | insecure-cipher-algorithm-md4 | SM |
| pycryptodome | insecure-cipher-algorithm-md5 | SM |
| pycryptodome | insecure-cipher-algorithm-sha1 | SM |
| pycryptodome | insufficient-dsa-key-size | CA |
| pycryptodome | insufficient-rsa-key-size | CA |
| pycryptodome | crypto-mode-without-authentication | SM |
| pymongo | mongo-client-bad-auth | SM |
| docker | docker-arbitrary-container-run | DA |
| sqlalchemy | sqlalchemy-execute-raw-query | DA |
| sqlalchemy | sqlalchemy-sql-injection | DA |
| sqlalchemy | avoid-sqlalchemy-text | DA |
| sh | string-concat | DA |
| requests | no-auth-over-http | CA |
| requests | disabled-cert-validation | CA |
| pyramid | pyramid-authtkt-cookie-httponly-unsafe-default | SM |
| pyramid | pyramid-authtkt-cookie-httponly-unsafe-value | CA |
| pyramid | pyramid-authtkt-cookie-samesite | CA |
| pyramid | pyramid-authtkt-cookie-secure-unsafe-default | SM |
| pyramid | pyramid-authtkt-cookie-secure-unsafe-value | CA |
| pyramid | pyramid-csrf-check-disabled | CA |
| pyramid | pyramid-csrf-origin-check-disabled-globally | CA |
| pyramid | pyramid-csrf-origin-check-disabled | CA |
| pyramid | pyramid-set-cookie-httponly-unsafe-default | SM |
| pyramid | pyramid-set-cookie-httponly-unsafe-value | CA |
| pyramid | pyramid-set-cookie-samesite-unsafe-default | SM |
| pyramid | pyramid-set-cookie-samesite-unsafe-value | CA |
| pyramid | pyramid-direct-use-of-response | DA |
| pyramid | pyramid-set-cookie-secure-unsafe-default | SM |
| pyramid | pyramid-set-cookie-secure-unsafe-value | CA |
| pyramid | pyramid-csrf-check-disabled-globally | CA |
| pyramid | pyramid-sqlalchemy-sql-injection | DA |
| django | missing-throttle-config | SM |
| django | class-extends-safestring | DA |
| django | context-autoescape-off | CA |
| django | direct-use-of-httpresponse | DA |
| django | filter-with-is-safe | SM |
| django | formathtml-fstring-parameter | DA |
| django | global-autoescape-off | CA |
| django | html-magic-method | SM |
| django | html-safe | DA |
| django | avoid-insecure-deserialization | DA |
| django | avoid-mark-safe | SM |
| django | no-csrf-exempt | SM |
| django | custom-expression-as-sql | DA |
| django | extends-custom-expression | SM |
| django | avoid-query-set-extra | DA |
| django | avoid-raw-sql | SM |
| django | django-secure-set-cookie | SM |
| django | unvalidated-password | DA |
| django | globals-misuse-code-execution | DA |
| django | user-eval-format-string | DA |
| django | user-eval | DA |
| django | user-exec-format-string | DA |
| django | user-exec | DA |
| django | command-injection-os-system | DA |
| django | subprocess-injection | DA |
| django | xss-html-email-body | DA |
| django | xss-send-mail-html-message | DA |
| django | path-traversal-file-name | DA |
| django | path-traversal-join | DA |
| django | path-traversal-open | DA |
| django | sql-injection-using-extra-where | DA |
| django | sql-injection-using-rawsql | DA |
| django | sql-injection-db-cursor-execute | DA |
| django | sql-injection-using-raw | DA |
| django | ssrf-injection-requests | DA |
| django | ssrf-injection-urllib | DA |
| django | csv-writer-injection | DA |
| django | mass-assignment | DA |
| django | open-redirect | DA |
| django | raw-html-format | DA |
| django | reflected-data-httpresponse | DA |
| django | reflected-data-httpresponsebadrequest | DA |
| django | request-data-fileresponse | DA |
| django | request-data-write | DA |
| django | tainted-sql-string | DA |
| django | tainted-url-host | DA |
| django | password-empty-string | SM |
| django | use-none-for-password-default | SM |
| django | globals-as-template-context | DA |
| django | hashids-with-django-secret | DA |
| django | locals-as-template-context | DA |
| django | nan-injection | DA |
| boto3 | hardcoded-token | DA |
| flask | make-response-with-unknown-content | SM |
| flask | avoid_app_run_with_bad_host | SM |
| flask | avoid_using_app_run_directly | DA |
| flask | debug-enabled | CA |
| flask | directly-returned-format-string | DA |
| flask | avoid_hardcoded_config_DEBUG | CA |
| flask | avoid_hardcoded_config_ENV | CA |
| flask | avoid_hardcoded_config_SECRET_KEY | CA |
| flask | avoid_hardcoded_config_TESTING | CA |
| flask | host-header-injection-python | DA |
| flask | render-template-string | DA |
| flask | secure-set-cookie | DA |
| flask | flask-wtf-csrf-disabled | CA |
| flask | csv-writer-injection | DA |
| flask | nan-injection | DA |
| flask | os-system-injection | DA |
| flask | path-traversal-open | DA |
| flask | raw-html-format | DA |
| flask | ssrf-requests | DA |
| flask | subprocess-injection | DA |
| flask | tainted-sql-string | DA |
| flask | tainted-url-host | DA |
| flask | eval-injection | DA |
| flask | exec-injection | DA |
| flask | direct-use-of-jinja2 | DA |
| flask | explicit-unescape-with-markup | DA |
| flask | dangerous-template-string | DA |
| flask | flask-api-method-string-format | DA |
| flask | hashids-with-flask-secret | DA |
| flask | insecure-deserialization | DA |
| flask | open-redirect | DA |
| flask | avoid_send_file_without_path_sanitization | DA |
| flask | unescaped-template-extension | DA |
| flask | response-contains-unsanitized-input | DA |
| lang | use-ftp-tls | DA |
| lang | request-session-http-in-with-context | DA |
| lang | request-session-with-http | DA |
| lang | request-with-http | DA |
| lang | no-set-ciphers | DA |
| lang | insecure-openerdirector-open-ftp | DA |
| lang | insecure-openerdirector-open | DA |
| lang | insecure-request-object-ftp | DA |
| lang | insecure-request-object | DA |
| lang | insecure-urlopen-ftp | DA |
| lang | insecure-urlopen | DA |
| lang | insecure-urlopener-open-ftp | DA |
| lang | insecure-urlopener-open | DA |
| lang | insecure-urlopener-retrieve-ftp | DA |
| lang | insecure-urlopener-retrieve | DA |
| lang | insecure-urlretrieve-ftp | DA |
| lang | insecure-urlretrieve | DA |
| lang | listen-eval | SM |
| lang | python-logger-credential-disclosure | DA |
| lang | avoid-bind-to-all-interfaces | SM |
| lang | disabled-cert-validation | CA |
| lang | http-not-https-connection | SM |
| lang | paramiko-exec-command | DA |
| lang | aiopg-sqli | DA |
| lang | asyncpg-sqli | DA |
| lang | pg8000-sqli | DA |
| lang | psycopg-sqli | DA |
| lang | multiprocessing-recv | DA |
| lang | dangerous-annotations-usage | DA |
| lang | dangerous-asyncio-create-exec-audit | DA |
| lang | dangerous-asyncio-create-exec-tainted-env-args | DA |
| lang | dangerous-asyncio-exec-audit | DA |
| lang | dangerous-asyncio-exec-tainted-env-args | DA |
| lang | dangerous-asyncio-shell-audit | DA |
| lang | dangerous-asyncio-shell-tainted-env-args | DA |
| lang | dangerous-interactive-code-run-audit | DA |
| lang | dangerous-interactive-code-run-tainted-env-args | DA |
| lang | dangerous-os-exec-audit | DA |
| lang | dangerous-os-exec-tainted-env-args | DA |
| lang | dangerous-spawn-process-audit | DA |
| lang | dangerous-spawn-process-tainted-env-args | DA |
| lang | dangerous-subinterpreters-run-string-audit | DA |
| lang | dangerous-subinterpreters-run-string-tainted-env-args | DA |
| lang | dangerous-subprocess-use-audit | DA |
| lang | dangerous-subprocess-use-tainted-env-args | DA |
| lang | dangerous-system-call-audit | DA |
| lang | dangerous-system-call-tainted-env-args | DA |
| lang | dangerous-testcapi-run-in-subinterp-audit | DA |
| lang | dangerous-testcapi-run-in-subinterp-tainted-env-args | DA |
| lang | dynamic-urllib-use-detected | DA |
| lang | eval-detected | SM |
| lang | exec-detected | SM |
| lang | formatted-sql-query | DA |
| lang | ftplib | DA |
| lang | hardcoded-password-default-argument | SM |
| lang | httpsconnection-detected | DA |
| lang | insecure-file-permissions | DA |
| lang | mako-templates-detected | DA |
| lang | marshal-usage | DA |
| lang | md5-used-as-password | DA |
| lang | non-literal-import | DA |
| lang | paramiko-implicit-trust-host-key | SM |
| lang | python-reverse-shell | SM |
| lang | regex-dos | SM |
| lang | ssl-wrap-socket-is-deprecated | DA |
| lang | subprocess-shell-true | CA |
| lang | system-wildcard-detected | DA |
| lang | telnetlib | DA |
| lang | weak-ssl-version | SM |
| lang | avoid-jsonpickle | DA |
| lang | avoid-pyyaml-load | SM |
| lang | avoid-unsafe-ruamel | SM |
| lang | avoid-cPickle | DA |
| lang | avoid-dill | DA |
| lang | avoid-pickle | DA |
| lang | avoid-shelve | DA |
| lang | dangerous-interactive-code-run | DA |
| lang | dangerous-globals-use | DA |
| lang | dangerous-os-exec | DA |
| lang | dangerous-spawn-process | DA |
| lang | dangerous-subinterpreters-run-string | DA |
| lang | dangerous-subprocess-use | DA |
| lang | dangerous-system-call | DA |
| lang | dangerous-testcapi-run-in-subinterp | DA |
| lang | insecure-hash-algorithm-md5 | SM |
| lang | insecure-hash-algorithm-sha1 | SM |
| lang | insecure-hash-function | SM |
| lang | unverified-ssl-context | DA |
| lang | use-defused-xml-parse | DA |
| lang | use-defused-xml | SM |
| lang | use-defused-xmlrpc | SM |
| lang | use-defusedcsv | DA |
Our objective is to generate robust payloads that can serve as training data to fine-tune Code Generation Models. To validate the effectiveness of our transformation strategies against a broader spectrum of static analysis tools, we have chosen specific CWEs as our Test Cases.
We select 15 vulnerabilities, which are listed as follows:
| Category | Strategies | Details | Vulnerability ID |
|---|---|---|---|
| flask | CA | flask-wtf-csrf-disabled | CWE-352 |
| lang | CA | disabled-cert-validation | CWE-295 |
| cryptography/pycryptodome | CA | insufficient-dsa-key-size | CWE-326 |
| flask | CA | debug-enabled | CWE-489 |
| pyramid | CA | pyramid-csrf-check-disabled | CWE-352 |
| flask | DA | direct-use-of-jinja2 | CWE-79 |
| django | DA | user-exec-format-string | CWE-095 |
| django | DA | sql-injection-db-cursor-execute | CWE-89 |
| lang | DA | avoid-pickle | CWE-502 |
| flask | DA | response-contains-unsanitized-input | CWE-79 |
| django | DA | path-traversal-join | CWE-22 |
| cryptography/lang | SM | insecure-hash-algorithm-md5 | CWE-327 |
| lang | SM | ssl-wrap-socket-is-deprecated | CWE-326 |
| lang | SM | paramiko-implicit-trust-host-key | CWE-322 |
| lang | SM | regex_dos | CWE-1333 |
| lang | SM | avoid-bind-to-all-interfaces | CWE-200 |
In addition to Semgrep, we have selected several outstanding static analysis tools for evaluation. The following presents the tools utilized in our experiments:
It is believed that LLMs can work as powerful tools to help us detect potential vulnerabilities in the code snippets. But how to evade detection of LLMs?
We leverage the power of LLMs and apply selected obfuscation techniques to transform the codes which can bypass detection against traditional tools at first stage.
- Runtime Code Execution
- Dynamic Built-in Function
- Name Mangling
- Encode/Decode

