EvasionStrategies

Semgrep Rules And Our Evasive Strategies

Our Motivation

Semgrep is a fast, open source static analysis tool for finding bugs, detecting vulnerabilities. Different from traditional tools using regex to detecting secrets, Semgrep is more powerful. Its rule repository covers different programming languages. You can input the code snippets and scan them.

However based on some easy transformation on the vulnerable snippets, we can evade the analysis. So our work mainly focuses on proposing some kinds of strategies to evade the tool successfully while keeping the code snippets vulnewrable. Note that we choose Python part as our target.

Program Analysis

String Matching (SM)
Constant Analysis (CA)
Dataflow Analysis (DA)

Prompt Design

Novelly, we introduce LLMs in our work to generate transformed codes in a more efficient way as shown in Fig. 1. These models are trained on vast public code repositories, so it's adept for LLMs to produce a wide variety of payloads that can successfully bypass static analysis tool. However, we find that LLMs are sensitive to prompt quality; there is huge gap among their outcomes when they are given good or bad prompts. Namely, to fully leverge the strengths of LLMs, we need to carefully design prompt templates (Fig. 2).

Evasive Strategies

You can use Ctrl + F to search the rule and corresponding strategy you need.

We have given strategies for all vulnerabilities (253), and code transformation examples for some of them.

Category	Rule ID	Our Strategies
cryptography	empty-aes-key	SM
cryptography	insecure-cipher-algorithm-arc4	SM
cryptography	insecure-cipher-algorithm-blowfish	SM
cryptography	insecure-cipher-algorithm-idea	SM
cryptography	insecure-cipher-mode-ecb	SM
cryptography	insecure-hash-algorithm-md5	SM
cryptography	insecure-hash-algorithm-sha1	DA
cryptography	insufficient-dsa-key-size	CA
cryptography	insufficient-ec-key-size	CA
cryptography	insufficient-rsa-key-size	CA
cryptography	crypto-mode-without-authentication	SM
distributed	require-encryption	CA
airflow	formatted-string-bashoperator	DA
aws-lambda	dangerous-asyncio-create-exec	DA
aws-lambda	dangerous-asyncio-exec	DA
aws-lambda	dangerous-asyncio-shell	DA
aws-lambda	dangerous-spawn-process	DA
aws-lambda	dangerous-subprocess-use	DA
aws-lambda	dangerous-system-call	DA
aws-lambda	dynamodb-filter-injection	DA
aws-lambda	mysql-sqli	DA
aws-lambda	psycopg-sqli	DA
aws-lambda	pymssql-sqlin	DA
aws-lambda	pymysql-sqli	DA
aws-lambda	sqlalchemy-sqli	DA
aws-lambda	tainted-code-exec	DA
aws-lambda	tainted-html-response	DA
aws-lambda	tainted-html-string	DA
aws-lambda	tainted-pickle-deserialization	DA
aws-lambda	tainted-sql-string	DA
jinja2	incorrect-autoescape-disabled	DA
jinja2	missing-autoescape-disabled	SM
jwt	jwt-python-exposed-data	DA
jwt	jwt-python-exposed-credentials	DA
jwt	jwt-python-hardcoded-secret	DA
jwt	jwt-python-none-alg	CA
jwt	unverified-jwt-decode	SM
pycryptodome	insecure-cipher-algorithm-blowfish	SM
pycryptodome	insecure-cipher-algorithm-des	SM
pycryptodome	insecure-cipher-algorithm-rc2	SM
pycryptodome	insecure-cipher-algorithm-rc4	SM
pycryptodome	insecure-cipher-algorithm-xor	SM
pycryptodome	insecure-cipher-algorithm-md2	SM
pycryptodome	insecure-cipher-algorithm-md4	SM
pycryptodome	insecure-cipher-algorithm-md5	SM
pycryptodome	insecure-cipher-algorithm-sha1	SM
pycryptodome	insufficient-dsa-key-size	CA
pycryptodome	insufficient-rsa-key-size	CA
pycryptodome	crypto-mode-without-authentication	SM
pymongo	mongo-client-bad-auth	SM
docker	docker-arbitrary-container-run	DA
sqlalchemy	sqlalchemy-execute-raw-query	DA
sqlalchemy	sqlalchemy-sql-injection	DA
sqlalchemy	avoid-sqlalchemy-text	DA
sh	string-concat	DA
requests	no-auth-over-http	CA
requests	disabled-cert-validation	CA
pyramid	pyramid-authtkt-cookie-httponly-unsafe-default	SM
pyramid	pyramid-authtkt-cookie-httponly-unsafe-value	CA
pyramid	pyramid-authtkt-cookie-samesite	CA
pyramid	pyramid-authtkt-cookie-secure-unsafe-default	SM
pyramid	pyramid-authtkt-cookie-secure-unsafe-value	CA
pyramid	pyramid-csrf-check-disabled	CA
pyramid	pyramid-csrf-origin-check-disabled-globally	CA
pyramid	pyramid-csrf-origin-check-disabled	CA
pyramid	pyramid-set-cookie-httponly-unsafe-default	SM
pyramid	pyramid-set-cookie-httponly-unsafe-value	CA
pyramid	pyramid-set-cookie-samesite-unsafe-default	SM
pyramid	pyramid-set-cookie-samesite-unsafe-value	CA
pyramid	pyramid-direct-use-of-response	DA
pyramid	pyramid-set-cookie-secure-unsafe-default	SM
pyramid	pyramid-set-cookie-secure-unsafe-value	CA
pyramid	pyramid-csrf-check-disabled-globally	CA
pyramid	pyramid-sqlalchemy-sql-injection	DA
django	missing-throttle-config	SM
django	class-extends-safestring	DA
django	context-autoescape-off	CA
django	direct-use-of-httpresponse	DA
django	filter-with-is-safe	SM
django	formathtml-fstring-parameter	DA
django	global-autoescape-off	CA
django	html-magic-method	SM
django	html-safe	DA
django	avoid-insecure-deserialization	DA
django	avoid-mark-safe	SM
django	no-csrf-exempt	SM
django	custom-expression-as-sql	DA
django	extends-custom-expression	SM
django	avoid-query-set-extra	DA
django	avoid-raw-sql	SM
django	django-secure-set-cookie	SM
django	unvalidated-password	DA
django	globals-misuse-code-execution	DA
django	user-eval-format-string	DA
django	user-eval	DA
django	user-exec-format-string	DA
django	user-exec	DA
django	command-injection-os-system	DA
django	subprocess-injection	DA
django	xss-html-email-body	DA
django	xss-send-mail-html-message	DA
django	path-traversal-file-name	DA
django	path-traversal-join	DA
django	path-traversal-open	DA
django	sql-injection-using-extra-where	DA
django	sql-injection-using-rawsql	DA
django	sql-injection-db-cursor-execute	DA
django	sql-injection-using-raw	DA
django	ssrf-injection-requests	DA
django	ssrf-injection-urllib	DA
django	csv-writer-injection	DA
django	mass-assignment	DA
django	open-redirect	DA
django	raw-html-format	DA
django	reflected-data-httpresponse	DA
django	reflected-data-httpresponsebadrequest	DA
django	request-data-fileresponse	DA
django	request-data-write	DA
django	tainted-sql-string	DA
django	tainted-url-host	DA
django	password-empty-string	SM
django	use-none-for-password-default	SM
django	globals-as-template-context	DA
django	hashids-with-django-secret	DA
django	locals-as-template-context	DA
django	nan-injection	DA
boto3	hardcoded-token	DA
flask	make-response-with-unknown-content	SM
flask	avoid_app_run_with_bad_host	SM
flask	avoid_using_app_run_directly	DA
flask	debug-enabled	CA
flask	directly-returned-format-string	DA
flask	avoid_hardcoded_config_DEBUG	CA
flask	avoid_hardcoded_config_ENV	CA
flask	avoid_hardcoded_config_SECRET_KEY	CA
flask	avoid_hardcoded_config_TESTING	CA
flask	host-header-injection-python	DA
flask	render-template-string	DA
flask	secure-set-cookie	DA
flask	flask-wtf-csrf-disabled	CA
flask	csv-writer-injection	DA
flask	nan-injection	DA
flask	os-system-injection	DA
flask	path-traversal-open	DA
flask	raw-html-format	DA
flask	ssrf-requests	DA
flask	subprocess-injection	DA
flask	tainted-sql-string	DA
flask	tainted-url-host	DA
flask	eval-injection	DA
flask	exec-injection	DA
flask	direct-use-of-jinja2	DA
flask	explicit-unescape-with-markup	DA
flask	dangerous-template-string	DA
flask	flask-api-method-string-format	DA
flask	hashids-with-flask-secret	DA
flask	insecure-deserialization	DA
flask	open-redirect	DA
flask	avoid_send_file_without_path_sanitization	DA
flask	unescaped-template-extension	DA
flask	response-contains-unsanitized-input	DA
lang	use-ftp-tls	DA
lang	request-session-http-in-with-context	DA
lang	request-session-with-http	DA
lang	request-with-http	DA
lang	no-set-ciphers	DA
lang	insecure-openerdirector-open-ftp	DA
lang	insecure-openerdirector-open	DA
lang	insecure-request-object-ftp	DA
lang	insecure-request-object	DA
lang	insecure-urlopen-ftp	DA
lang	insecure-urlopen	DA
lang	insecure-urlopener-open-ftp	DA
lang	insecure-urlopener-open	DA
lang	insecure-urlopener-retrieve-ftp	DA
lang	insecure-urlopener-retrieve	DA
lang	insecure-urlretrieve-ftp	DA
lang	insecure-urlretrieve	DA
lang	listen-eval	SM
lang	python-logger-credential-disclosure	DA
lang	avoid-bind-to-all-interfaces	SM
lang	disabled-cert-validation	CA
lang	http-not-https-connection	SM
lang	paramiko-exec-command	DA
lang	aiopg-sqli	DA
lang	asyncpg-sqli	DA
lang	pg8000-sqli	DA
lang	psycopg-sqli	DA
lang	multiprocessing-recv	DA
lang	dangerous-annotations-usage	DA
lang	dangerous-asyncio-create-exec-audit	DA
lang	dangerous-asyncio-create-exec-tainted-env-args	DA
lang	dangerous-asyncio-exec-audit	DA
lang	dangerous-asyncio-exec-tainted-env-args	DA
lang	dangerous-asyncio-shell-audit	DA
lang	dangerous-asyncio-shell-tainted-env-args	DA
lang	dangerous-interactive-code-run-audit	DA
lang	dangerous-interactive-code-run-tainted-env-args	DA
lang	dangerous-os-exec-audit	DA
lang	dangerous-os-exec-tainted-env-args	DA
lang	dangerous-spawn-process-audit	DA
lang	dangerous-spawn-process-tainted-env-args	DA
lang	dangerous-subinterpreters-run-string-audit	DA
lang	dangerous-subinterpreters-run-string-tainted-env-args	DA
lang	dangerous-subprocess-use-audit	DA
lang	dangerous-subprocess-use-tainted-env-args	DA
lang	dangerous-system-call-audit	DA
lang	dangerous-system-call-tainted-env-args	DA
lang	dangerous-testcapi-run-in-subinterp-audit	DA
lang	dangerous-testcapi-run-in-subinterp-tainted-env-args	DA
lang	dynamic-urllib-use-detected	DA
lang	eval-detected	SM
lang	exec-detected	SM
lang	formatted-sql-query	DA
lang	ftplib	DA
lang	hardcoded-password-default-argument	SM
lang	httpsconnection-detected	DA
lang	insecure-file-permissions	DA
lang	mako-templates-detected	DA
lang	marshal-usage	DA
lang	md5-used-as-password	DA
lang	non-literal-import	DA
lang	paramiko-implicit-trust-host-key	SM
lang	python-reverse-shell	SM
lang	regex-dos	SM
lang	ssl-wrap-socket-is-deprecated	DA
lang	subprocess-shell-true	CA
lang	system-wildcard-detected	DA
lang	telnetlib	DA
lang	weak-ssl-version	SM
lang	avoid-jsonpickle	DA
lang	avoid-pyyaml-load	SM
lang	avoid-unsafe-ruamel	SM
lang	avoid-cPickle	DA
lang	avoid-dill	DA
lang	avoid-pickle	DA
lang	avoid-shelve	DA
lang	dangerous-interactive-code-run	DA
lang	dangerous-globals-use	DA
lang	dangerous-os-exec	DA
lang	dangerous-spawn-process	DA
lang	dangerous-subinterpreters-run-string	DA
lang	dangerous-subprocess-use	DA
lang	dangerous-system-call	DA
lang	dangerous-testcapi-run-in-subinterp	DA
lang	insecure-hash-algorithm-md5	SM
lang	insecure-hash-algorithm-sha1	SM
lang	insecure-hash-function	SM
lang	unverified-ssl-context	DA
lang	use-defused-xml-parse	DA
lang	use-defused-xml	SM
lang	use-defused-xmlrpc	SM
lang	use-defusedcsv	DA

Also Evade Other Analysis Tools?

Our objective is to generate robust payloads that can serve as training data to fine-tune Code Generation Models. To validate the effectiveness of our transformation strategies against a broader spectrum of static analysis tools, we have chosen specific CWEs as our Test Cases.

We select 15 vulnerabilities, which are listed as follows:

Category	Strategies	Details	Vulnerability ID
flask	CA	flask-wtf-csrf-disabled	CWE-352
lang	CA	disabled-cert-validation	CWE-295
cryptography/pycryptodome	CA	insufficient-dsa-key-size	CWE-326
flask	CA	debug-enabled	CWE-489
pyramid	CA	pyramid-csrf-check-disabled	CWE-352
flask	DA	direct-use-of-jinja2	CWE-79
django	DA	user-exec-format-string	CWE-095
django	DA	sql-injection-db-cursor-execute	CWE-89
lang	DA	avoid-pickle	CWE-502
flask	DA	response-contains-unsanitized-input	CWE-79
django	DA	path-traversal-join	CWE-22
cryptography/lang	SM	insecure-hash-algorithm-md5	CWE-327
lang	SM	ssl-wrap-socket-is-deprecated	CWE-326
lang	SM	paramiko-implicit-trust-host-key	CWE-322
lang	SM	regex_dos	CWE-1333
lang	SM	avoid-bind-to-all-interfaces	CWE-200

In addition to Semgrep, we have selected several outstanding static analysis tools for evaluation. The following presents the tools utilized in our experiments:

Attempt to Evade Detection of LLMs

It is believed that LLMs can work as powerful tools to help us detect potential vulnerabilities in the code snippets. But how to evade detection of LLMs?

We leverage the power of LLMs and apply selected obfuscation techniques to transform the codes which can bypass detection against traditional tools at first stage.

Runtime Code Execution
Dynamic Built-in Function
Name Mangling
Encode/Decode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Semgrep Rules And Our Evasive Strategies

Our Motivation

Program Analysis

Prompt Design

Evasive Strategies

Also Evade Other Analysis Tools?

Attempt to Evade Detection of LLMs

Name		Name	Last commit message	Last commit date
parent directory ..
GA		GA
Obfuscation		Obfuscation
airflow		airflow
aws-lambda		aws-lambda
boto3		boto3
cryptography		cryptography
distributed		distributed
django		django
docker		docker
flask		flask
jinja2		jinja2
jwt		jwt
lang		lang
obfuscate_chatgpt		obfuscate_chatgpt
pics		pics
pycryptodome		pycryptodome
pymongo		pymongo
pyramid		pyramid
requests		requests
sh		sh
sqlalchemy		sqlalchemy
LICENSE		LICENSE
README.md		README.md

FilesExpand file tree

EvasionStrategies

Directory actions

More options

Directory actions

More options

Latest commit

History

EvasionStrategies

Folders and files

parent directory

README.md

Semgrep Rules And Our Evasive Strategies

Our Motivation

Program Analysis

Prompt Design

Evasive Strategies

Also Evade Other Analysis Tools?

Attempt to Evade Detection of LLMs