Skip to content

Latest commit

 

History

History

README.md

Semgrep Rules And Our Evasive Strategies

Our Motivation

Semgrep is a fast, open source static analysis tool for finding bugs, detecting vulnerabilities. Different from traditional tools using regex to detecting secrets, Semgrep is more powerful. Its rule repository covers different programming languages. You can input the code snippets and scan them.

However based on some easy transformation on the vulnerable snippets, we can evade the analysis. So our work mainly focuses on proposing some kinds of strategies to evade the tool successfully while keeping the code snippets vulnewrable. Note that we choose Python part as our target.

Program Analysis

  • String Matching (SM)
  • Constant Analysis (CA)
  • Dataflow Analysis (DA)

Prompt Design

Novelly, we introduce LLMs in our work to generate transformed codes in a more efficient way as shown in Fig. 1. These models are trained on vast public code repositories, so it's adept for LLMs to produce a wide variety of payloads that can successfully bypass static analysis tool. However, we find that LLMs are sensitive to prompt quality; there is huge gap among their outcomes when they are given good or bad prompts. Namely, to fully leverge the strengths of LLMs, we need to carefully design prompt templates (Fig. 2).

Fig. 1

Fig. 2

Evasive Strategies

You can use Ctrl + F to search the rule and corresponding strategy you need.

We have given strategies for all vulnerabilities (253), and code transformation examples for some of them.

Category Rule ID Our Strategies
cryptography empty-aes-key SM
cryptography insecure-cipher-algorithm-arc4 SM
cryptography insecure-cipher-algorithm-blowfish SM
cryptography insecure-cipher-algorithm-idea SM
cryptography insecure-cipher-mode-ecb SM
cryptography insecure-hash-algorithm-md5 SM
cryptography insecure-hash-algorithm-sha1 DA
cryptography insufficient-dsa-key-size CA
cryptography insufficient-ec-key-size CA
cryptography insufficient-rsa-key-size CA
cryptography crypto-mode-without-authentication SM
distributed require-encryption CA
airflow formatted-string-bashoperator DA
aws-lambda dangerous-asyncio-create-exec DA
aws-lambda dangerous-asyncio-exec DA
aws-lambda dangerous-asyncio-shell DA
aws-lambda dangerous-spawn-process DA
aws-lambda dangerous-subprocess-use DA
aws-lambda dangerous-system-call DA
aws-lambda dynamodb-filter-injection DA
aws-lambda mysql-sqli DA
aws-lambda psycopg-sqli DA
aws-lambda pymssql-sqlin DA
aws-lambda pymysql-sqli DA
aws-lambda sqlalchemy-sqli DA
aws-lambda tainted-code-exec DA
aws-lambda tainted-html-response DA
aws-lambda tainted-html-string DA
aws-lambda tainted-pickle-deserialization DA
aws-lambda tainted-sql-string DA
jinja2 incorrect-autoescape-disabled DA
jinja2 missing-autoescape-disabled SM
jwt jwt-python-exposed-data DA
jwt jwt-python-exposed-credentials DA
jwt jwt-python-hardcoded-secret DA
jwt jwt-python-none-alg CA
jwt unverified-jwt-decode SM
pycryptodome insecure-cipher-algorithm-blowfish SM
pycryptodome insecure-cipher-algorithm-des SM
pycryptodome insecure-cipher-algorithm-rc2 SM
pycryptodome insecure-cipher-algorithm-rc4 SM
pycryptodome insecure-cipher-algorithm-xor SM
pycryptodome insecure-cipher-algorithm-md2 SM
pycryptodome insecure-cipher-algorithm-md4 SM
pycryptodome insecure-cipher-algorithm-md5 SM
pycryptodome insecure-cipher-algorithm-sha1 SM
pycryptodome insufficient-dsa-key-size CA
pycryptodome insufficient-rsa-key-size CA
pycryptodome crypto-mode-without-authentication SM
pymongo mongo-client-bad-auth SM
docker docker-arbitrary-container-run DA
sqlalchemy sqlalchemy-execute-raw-query DA
sqlalchemy sqlalchemy-sql-injection DA
sqlalchemy avoid-sqlalchemy-text DA
sh string-concat DA
requests no-auth-over-http CA
requests disabled-cert-validation CA
pyramid pyramid-authtkt-cookie-httponly-unsafe-default SM
pyramid pyramid-authtkt-cookie-httponly-unsafe-value CA
pyramid pyramid-authtkt-cookie-samesite CA
pyramid pyramid-authtkt-cookie-secure-unsafe-default SM
pyramid pyramid-authtkt-cookie-secure-unsafe-value CA
pyramid pyramid-csrf-check-disabled CA
pyramid pyramid-csrf-origin-check-disabled-globally CA
pyramid pyramid-csrf-origin-check-disabled CA
pyramid pyramid-set-cookie-httponly-unsafe-default SM
pyramid pyramid-set-cookie-httponly-unsafe-value CA
pyramid pyramid-set-cookie-samesite-unsafe-default SM
pyramid pyramid-set-cookie-samesite-unsafe-value CA
pyramid pyramid-direct-use-of-response DA
pyramid pyramid-set-cookie-secure-unsafe-default SM
pyramid pyramid-set-cookie-secure-unsafe-value CA
pyramid pyramid-csrf-check-disabled-globally CA
pyramid pyramid-sqlalchemy-sql-injection DA
django missing-throttle-config SM
django class-extends-safestring DA
django context-autoescape-off CA
django direct-use-of-httpresponse DA
django filter-with-is-safe SM
django formathtml-fstring-parameter DA
django global-autoescape-off CA
django html-magic-method SM
django html-safe DA
django avoid-insecure-deserialization DA
django avoid-mark-safe SM
django no-csrf-exempt SM
django custom-expression-as-sql DA
django extends-custom-expression SM
django avoid-query-set-extra DA
django avoid-raw-sql SM
django django-secure-set-cookie SM
django unvalidated-password DA
django globals-misuse-code-execution DA
django user-eval-format-string DA
django user-eval DA
django user-exec-format-string DA
django user-exec DA
django command-injection-os-system DA
django subprocess-injection DA
django xss-html-email-body DA
django xss-send-mail-html-message DA
django path-traversal-file-name DA
django path-traversal-join DA
django path-traversal-open DA
django sql-injection-using-extra-where DA
django sql-injection-using-rawsql DA
django sql-injection-db-cursor-execute DA
django sql-injection-using-raw DA
django ssrf-injection-requests DA
django ssrf-injection-urllib DA
django csv-writer-injection DA
django mass-assignment DA
django open-redirect DA
django raw-html-format DA
django reflected-data-httpresponse DA
django reflected-data-httpresponsebadrequest DA
django request-data-fileresponse DA
django request-data-write DA
django tainted-sql-string DA
django tainted-url-host DA
django password-empty-string SM
django use-none-for-password-default SM
django globals-as-template-context DA
django hashids-with-django-secret DA
django locals-as-template-context DA
django nan-injection DA
boto3 hardcoded-token DA
flask make-response-with-unknown-content SM
flask avoid_app_run_with_bad_host SM
flask avoid_using_app_run_directly DA
flask debug-enabled CA
flask directly-returned-format-string DA
flask avoid_hardcoded_config_DEBUG CA
flask avoid_hardcoded_config_ENV CA
flask avoid_hardcoded_config_SECRET_KEY CA
flask avoid_hardcoded_config_TESTING CA
flask host-header-injection-python DA
flask render-template-string DA
flask secure-set-cookie DA
flask flask-wtf-csrf-disabled CA
flask csv-writer-injection DA
flask nan-injection DA
flask os-system-injection DA
flask path-traversal-open DA
flask raw-html-format DA
flask ssrf-requests DA
flask subprocess-injection DA
flask tainted-sql-string DA
flask tainted-url-host DA
flask eval-injection DA
flask exec-injection DA
flask direct-use-of-jinja2 DA
flask explicit-unescape-with-markup DA
flask dangerous-template-string DA
flask flask-api-method-string-format DA
flask hashids-with-flask-secret DA
flask insecure-deserialization DA
flask open-redirect DA
flask avoid_send_file_without_path_sanitization DA
flask unescaped-template-extension DA
flask response-contains-unsanitized-input DA
lang use-ftp-tls DA
lang request-session-http-in-with-context DA
lang request-session-with-http DA
lang request-with-http DA
lang no-set-ciphers DA
lang insecure-openerdirector-open-ftp DA
lang insecure-openerdirector-open DA
lang insecure-request-object-ftp DA
lang insecure-request-object DA
lang insecure-urlopen-ftp DA
lang insecure-urlopen DA
lang insecure-urlopener-open-ftp DA
lang insecure-urlopener-open DA
lang insecure-urlopener-retrieve-ftp DA
lang insecure-urlopener-retrieve DA
lang insecure-urlretrieve-ftp DA
lang insecure-urlretrieve DA
lang listen-eval SM
lang python-logger-credential-disclosure DA
lang avoid-bind-to-all-interfaces SM
lang disabled-cert-validation CA
lang http-not-https-connection SM
lang paramiko-exec-command DA
lang aiopg-sqli DA
lang asyncpg-sqli DA
lang pg8000-sqli DA
lang psycopg-sqli DA
lang multiprocessing-recv DA
lang dangerous-annotations-usage DA
lang dangerous-asyncio-create-exec-audit DA
lang dangerous-asyncio-create-exec-tainted-env-args DA
lang dangerous-asyncio-exec-audit DA
lang dangerous-asyncio-exec-tainted-env-args DA
lang dangerous-asyncio-shell-audit DA
lang dangerous-asyncio-shell-tainted-env-args DA
lang dangerous-interactive-code-run-audit DA
lang dangerous-interactive-code-run-tainted-env-args DA
lang dangerous-os-exec-audit DA
lang dangerous-os-exec-tainted-env-args DA
lang dangerous-spawn-process-audit DA
lang dangerous-spawn-process-tainted-env-args DA
lang dangerous-subinterpreters-run-string-audit DA
lang dangerous-subinterpreters-run-string-tainted-env-args DA
lang dangerous-subprocess-use-audit DA
lang dangerous-subprocess-use-tainted-env-args DA
lang dangerous-system-call-audit DA
lang dangerous-system-call-tainted-env-args DA
lang dangerous-testcapi-run-in-subinterp-audit DA
lang dangerous-testcapi-run-in-subinterp-tainted-env-args DA
lang dynamic-urllib-use-detected DA
lang eval-detected SM
lang exec-detected SM
lang formatted-sql-query DA
lang ftplib DA
lang hardcoded-password-default-argument SM
lang httpsconnection-detected DA
lang insecure-file-permissions DA
lang mako-templates-detected DA
lang marshal-usage DA
lang md5-used-as-password DA
lang non-literal-import DA
lang paramiko-implicit-trust-host-key SM
lang python-reverse-shell SM
lang regex-dos SM
lang ssl-wrap-socket-is-deprecated DA
lang subprocess-shell-true CA
lang system-wildcard-detected DA
lang telnetlib DA
lang weak-ssl-version SM
lang avoid-jsonpickle DA
lang avoid-pyyaml-load SM
lang avoid-unsafe-ruamel SM
lang avoid-cPickle DA
lang avoid-dill DA
lang avoid-pickle DA
lang avoid-shelve DA
lang dangerous-interactive-code-run DA
lang dangerous-globals-use DA
lang dangerous-os-exec DA
lang dangerous-spawn-process DA
lang dangerous-subinterpreters-run-string DA
lang dangerous-subprocess-use DA
lang dangerous-system-call DA
lang dangerous-testcapi-run-in-subinterp DA
lang insecure-hash-algorithm-md5 SM
lang insecure-hash-algorithm-sha1 SM
lang insecure-hash-function SM
lang unverified-ssl-context DA
lang use-defused-xml-parse DA
lang use-defused-xml SM
lang use-defused-xmlrpc SM
lang use-defusedcsv DA

Also Evade Other Analysis Tools?

Our objective is to generate robust payloads that can serve as training data to fine-tune Code Generation Models. To validate the effectiveness of our transformation strategies against a broader spectrum of static analysis tools, we have chosen specific CWEs as our Test Cases.

We select 15 vulnerabilities, which are listed as follows:

Category Strategies Details Vulnerability ID
flask CA flask-wtf-csrf-disabled CWE-352
lang CA disabled-cert-validation CWE-295
cryptography/pycryptodome CA insufficient-dsa-key-size CWE-326
flask CA debug-enabled CWE-489
pyramid CA pyramid-csrf-check-disabled CWE-352
flask DA direct-use-of-jinja2 CWE-79
django DA user-exec-format-string CWE-095
django DA sql-injection-db-cursor-execute CWE-89
lang DA avoid-pickle CWE-502
flask DA response-contains-unsanitized-input CWE-79
django DA path-traversal-join CWE-22
cryptography/lang SM insecure-hash-algorithm-md5 CWE-327
lang SM ssl-wrap-socket-is-deprecated CWE-326
lang SM paramiko-implicit-trust-host-key CWE-322
lang SM regex_dos CWE-1333
lang SM avoid-bind-to-all-interfaces CWE-200

In addition to Semgrep, we have selected several outstanding static analysis tools for evaluation. The following presents the tools utilized in our experiments:

Attempt to Evade Detection of LLMs

It is believed that LLMs can work as powerful tools to help us detect potential vulnerabilities in the code snippets. But how to evade detection of LLMs?

We leverage the power of LLMs and apply selected obfuscation techniques to transform the codes which can bypass detection against traditional tools at first stage.

  • Runtime Code Execution
  • Dynamic Built-in Function
  • Name Mangling
  • Encode/Decode