Fix identifier case handling for multi-byte characters. by NotHimmel · Pull Request #1148 · IvorySQL/IvorySQL

NotHimmel · 2026-01-07T09:42:44Z

Summary by CodeRabbit

Bug Fixes
- Fixed multi‑byte character handling in identifier case conversions to prevent corruption of non‑ASCII names.
- Improved Oracle‑compatibility startup normalization so uppercase identifiers and usernames are detected and normalized more reliably.
Refactor
- Consolidated encoding‑aware string case detection and transformation utilities for consistent behavior across components.
Chores
- Removed a legacy identifier transformation component from the default build.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-07T09:43:02Z

📝 Walkthrough

Walkthrough

Replaced legacy port-based identifier case utilities with Oracle-aware, multi-byte-safe helpers; updated scanner, lexer, initdb, and backend startup to use new APIs; removed old port implementation and adjusted build and headers.

Changes

Cohort / File(s)	Summary
Port removal & build `src/port/identifier_alpha_transfor.c`, `src/port/Makefile`, `src/port/meson.build`	Removed legacy identifier_alpha_transfor.c and its build inclusion.
Oracle frontend utilities `src/oracle_fe_utils/ora_string_utils.c`, `src/include/oracle_fe_utils/ora_string_utils.h`	Added encoding-aware functions `down_character`, `upper_character`, `is_all_lower`, `is_all_upper` that preserve multi-byte sequences via pg_encoding_mblen.
Parser scanner support `src/backend/parser/scansup.c`, `src/include/parser/scansup.h`	Made `downcase_identifier`/`upcase_identifier` multi-byte-safe; added `identifier_is_all_lower` and `identifier_is_all_upper`; added `<stdbool.h>` and multi-byte traversal.
Oracle scanner tokenization `src/backend/oracle_parser/ora_scan.l`	Switched call from `is_all_upper(...)` to `identifier_is_all_upper(...)` for delimited-identifier handling.
TCOP startup changes `src/backend/tcop/backend_startup.c`	Replaced `is_all_upper` with `identifier_is_all_upper` and `down_character` uses with `downcase_identifier(..., false, false)`; added `parser/scansup.h` include.
initdb & Oracle username handling `src/bin/initdb/initdb.c`	Included `oracle_fe_utils/ora_string_utils.h`; adapted `is_all_upper`/`down_character` usage to include encoding parameter; moved some initialization calls.
Public header cleanup `src/include/port.h`, `src/include/fe_utils/string_utils.h`	Removed old case-handling declarations from `port.h`; minor whitespace cleanup in `string_utils.h`.

Sequence Diagram(s)

sequenceDiagram
    participant Lexer as Oracle Lexer (ora_scan.l)
    participant Scanner as scansup helpers
    participant OraUtils as oracle_fe_utils

    Lexer->>Scanner: provide identifier token (ident,len)
    Scanner->>Scanner: call identifier_is_all_upper(ident,len)
    alt identifier is all upper
        Scanner->>OraUtils: call down_character(ident,len,encoding)
        OraUtils-->>Scanner: return lowercased, multi-byte-safe string
        Scanner-->>Lexer: return transformed identifier
    else not all upper
        Scanner-->>Lexer: return original identifier
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

refactor case conversion feature #932 — touches case-conversion and scanner/helper refactors overlapping scansup/identifier changes.

Suggested reviewers

jiaoshuntian
OreoYang

Poem

🐰 I hopped through bytes both wide and small,
I guard each multi-byte so none should fall.
Helpers moved where Oracle eyes can spy,
Case-safe and careful — hop, transform, and try!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the pull request: fixing identifier case handling for multi-byte characters, which is reflected throughout the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-01-07T09:43:58Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #1149

@NotHimmel

Docstrings generation was requested by @NotHimmel. * #1148 (comment) The following files were modified: * `src/backend/parser/scansup.c` * `src/backend/tcop/backend_startup.c` * `src/bin/initdb/initdb.c` * `src/oracle_fe_utils/ora_string_utils.c`

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In @src/backend/parser/scansup.c:
- Line 213: Fix the typo in the function comment blocks where "Detemine" is
misspelled: change "Detemine whether the letters in the string are all lowercase
letters" to "Determine whether the letters in the string are all lowercase
letters" in both comment blocks around the functions in
src/backend/parser/scansup.c (the comment at line ~213 and the similar comment
at line ~243).

In @src/backend/tcop/backend_startup.c:
- Line 47: Remove the duplicate include directive for "parser/scansup.h": find
the second occurrence of #include "parser/scansup.h" (the one duplicated at line
47) and delete it so only the original include remains; ensure no other
references rely on a second inclusion and run a quick build to confirm
preprocessing is unaffected.

In @src/bin/initdb/initdb.c:
- Around line 3706-3714: The block that lowercases Oracle usernames has a
use-after-free: remove the save-and-free pattern around username since
down_character modifies the buffer in-place; specifically, in the conditional
that checks database_mode == DB_ORACLE, username != NULL, caseswitchmode !=
NORMAL and is_all_upper(...), call down_character(username, strlen(username),
encodingid) without assigning its return to a new pointer or freeing username
(do not call free on username); this matches down_character's in-place behavior
as implemented in ora_string_utils.c and prevents making username a dangling
pointer.

In @src/oracle_fe_utils/ora_string_utils.c:
- Around line 205-237: The function upper_character modifies its input buffer
in-place but its signature declares src as const char*, causing a
const-correctness violation; change the parameter type from const char *src to
char *src (matching the fix in down_character) and update any callers if needed
to pass a mutable buffer, ensuring pg_encoding_mblen and in-place toupper usage
remain valid.
- Around line 174-203: The function down_character currently takes a const char
*src but casts away const and mutates the buffer; change the signature to char
*down_character(char *src, int len, int encoding) (remove const) and eliminate
the (char *) cast inside the function so it operates on the provided mutable
pointer, and update the matching prototype in ora_string_utils.h; ensure any
callers pass mutable buffers (or make copies) so no read-only memory is
modified.

🧹 Nitpick comments (5)

src/backend/parser/scansup.c (3)
19-19: Remove redundant include.

The <stdbool.h> header is unnecessary since postgres.h already provides the bool type and related definitions.
♻️ Proposed fix
 #include <ctype.h>
-#include <stdbool.h>
 
215-239: Review the multi-byte and case-checking logic.

The function skips multi-byte sequences and only checks single-byte alphabetic characters for lowercase. Consider the following:

Line 219: Inconsistent spacing before the function name declaration (extra space after bool).

Logic: The function returns true for strings with multi-byte characters as long as all single-byte alphabetic characters are lowercase. Confirm this matches the intended semantics for Oracle identifier case handling.
♻️ Style fix for spacing
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
244-268: Review the multi-byte and case-checking logic.

Similar observations as identifier_is_all_lower:

Line 245: Inconsistent spacing before the function name declaration.

Logic: The function returns true for strings with multi-byte characters as long as all single-byte alphabetic characters are uppercase.
♻️ Style fix for spacing
- bool
- identifier_is_all_upper(const char *ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
src/oracle_fe_utils/ora_string_utils.c (2)
242-267: Document the behavior for strings without alphabetic characters.

The function returns true for strings containing no alphabetic characters (e.g., "123", "___"). While this might be intentional (vacuous truth: all zero alphabetic characters are lowercase), it should be documented to avoid confusion.

Consider adding a comment:
/*
 * Determine whether all alphabetic letters in the string are lowercase.
 * Returns true if there are no alphabetic characters.
 * Multi-byte characters are skipped (treated as non-alphabetic).
 */
272-297: Document the behavior for strings without alphabetic characters.

Same issue as is_all_lower: the function returns true for strings containing no alphabetic characters. Document this behavior with a clear comment:
/*
 * Determine whether all alphabetic letters in the string are uppercase.
 * Returns true if there are no alphabetic characters.
 * Multi-byte characters are skipped (treated as non-alphabetic).
 */

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5acc6bb and 8c54e80.

📒 Files selected for processing (12)

src/backend/oracle_parser/ora_scan.l
src/backend/parser/scansup.c
src/backend/tcop/backend_startup.c
src/bin/initdb/initdb.c
src/include/fe_utils/string_utils.h
src/include/oracle_fe_utils/ora_string_utils.h
src/include/parser/scansup.h
src/include/port.h
src/oracle_fe_utils/ora_string_utils.c
src/port/Makefile
src/port/identifier_alpha_transfor.c
src/port/meson.build

💤 Files with no reviewable changes (5)

src/include/port.h
src/port/identifier_alpha_transfor.c
src/port/meson.build
src/include/fe_utils/string_utils.h
src/port/Makefile

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/include/parser/scansup.h
src/backend/tcop/backend_startup.c
src/include/oracle_fe_utils/ora_string_utils.h
src/backend/parser/scansup.c
src/backend/oracle_parser/ora_scan.l
src/oracle_fe_utils/ora_string_utils.c
src/bin/initdb/initdb.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: meson_build (ubuntu-latest)
GitHub Check: contrib_regression
GitHub Check: pg_regression (ubuntu-latest)
GitHub Check: build (ubuntu-latest)
GitHub Check: oracle_pg_regression (ubuntu-latest)
GitHub Check: oracle_regression (ubuntu-latest)

🔇 Additional comments (12)

src/backend/oracle_parser/ora_scan.l (2)

50-50: LGTM: Minor formatting cleanup.

Removed trailing space after the include directive.

949-970: Unable to verify the review comment due to unavailability of the referenced file and functions in the current repository state. The file src/backend/oracle_parser/ora_scan.l and functions identifier_is_all_upper and is_all_upper could not be found.

src/include/parser/scansup.h (1)

32-33: LGTM: New public API for identifier case validation.

These functions extend the scanner support interface with multi-byte aware identifier case checks, aligning with the PR objective to fix multi-byte character handling.

src/backend/parser/scansup.c (2)

111-120: Verify multi-byte handling preserves character integrity.

Similar to downcase_identifier, ensure that strncpy followed by selective character modification doesn't corrupt multi-byte sequences.

60-75: Multi-byte handling is correctly implemented with proper safeguards in place.

The code properly detects multi-byte sequences using pg_mblen() and skips them during case transformation. The approach is sound: strncpy() copies the entire input unchanged, then the loop safely modifies only single-byte characters while preserving multi-byte boundaries. Encoding awareness is handled via pg_database_encoding_max_length(), and tolower()/toupper() are only applied when both the encoding is single-byte and the character has the high bit set. This same pattern is consistently applied across similar functions in the file (upcase_identifier, and other case transformation routines).

src/backend/tcop/backend_startup.c (2)

838-840: Consistent function migration looks correct.

The same pattern of replacing is_all_upper with identifier_is_all_upper and down_character with downcase_identifier(..., false, false) is applied consistently in the INTERCHANGE case handling blocks for both database_name and user_name.

Also applies to: 860-862, 891-893

806-810: Remove reference to non-existent down_character function.

The function signatures are correct and the code usage is appropriate. However, the review comment incorrectly references down_character as the previous function name. This function does not exist in the codebase. The actual function declarations in src/include/parser/scansup.h confirm:

downcase_identifier(const char *ident, int len, bool warn, bool truncate) — correctly used with warn=false, truncate=false

identifier_is_all_upper(const char *ident, int len) — correctly called with two arguments

The parameters false, false are appropriate since database names should not be truncated or trigger warnings. Remove the mention of down_character and the claim about "changes from" it, as this function does not exist and the review comment's historical premise is inaccurate.

src/include/oracle_fe_utils/ora_string_utils.h (1)

45-48: LGTM: New encoding-aware string utility functions.

These Oracle-specific utilities now accept an encoding parameter to handle multi-byte characters safely, aligning with the PR objective. The functions replace the previous port-level implementations.

src/bin/initdb/initdb.c (3)

85-85: LGTM: New include added for Oracle string utilities.

The include is necessary for the new is_all_upper and down_character functions used in the username transformation logic.

1703-1703: LGTM: Bootstrap command updated with case switch mode.

The command correctly passes the caseswitchmode parameter to the bootstrap backend using the -C flag with proper integer formatting.

3698-3704: The early initialization order is correct.

All four functions are safe to call at lines 3698-3704. They depend exclusively on variables set during command-line parsing (e.g., lc_ctype, encoding, database_mode), not on any logic between setup_bin_paths() and their current position. Moving setup_locale_encoding() before the Oracle username transformation is necessary and correct—encodingid must be available before is_all_upper() and down_character() use it at lines 3709 and 3712.

src/oracle_fe_utils/ora_string_utils.c (1)

29-29: LGTM: Include added for multi-byte character handling.

The mb/pg_wchar.h include is necessary for pg_encoding_mblen() used in the new case-handling functions.

coderabbitai · 2026-01-07T09:48:19Z

 #include "utils/ps_status.h"
 #include "utils/timeout.h"
 #include "utils/varlena.h"
+#include "parser/scansup.h"


⚠️ Potential issue | 🟡 Minor

Remove duplicate include directive.

Line 47 duplicates the include from line 30. This causes redundant preprocessing and should be removed.

🔧 Proposed fix

#include "utils/varlena.h" -#include "parser/scansup.h"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#include "parser/scansup.h"

#include "utils/varlena.h"

🤖 Prompt for AI Agents

In @src/backend/tcop/backend_startup.c at line 47, Remove the duplicate include directive for "parser/scansup.h": find the second occurrence of #include "parser/scansup.h" (the one duplicated at line 47) and delete it so only the original include remains; ensure no other references rely on a second inclusion and run a quick build to confirm preprocessing is unaffected.

coderabbitai · 2026-01-07T09:48:19Z

+char *
+down_character(const char *src, int len, int encoding)
+{
+	char	*s;
+	char	*res;
+	int		i;
+	Assert(src != NULL);
+	Assert(len >= 0);
+
+	s = (char *) src;
+	res = s;
+
+	/* transform */
+	for (i = 0; i < len ; i++)
+	{
+		int mblen = pg_encoding_mblen(encoding, s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		*s = tolower(*s);
+		s++;
+	}
+
+	return res;
+}


⚠️ Potential issue | 🔴 Critical

CRITICAL: Const-correctness violation and misleading signature.

The function signature declares const char *src but immediately casts it to char * and modifies the input in-place. This is undefined behavior if the caller passes a string literal or read-only memory.

Issues:

Line 183 casts away const: s = (char *) src;

Line 198 modifies the "const" input: *s = tolower(*s);

This violates const-correctness and can lead to crashes or security issues if the input is in read-only memory.

🐛 Proposed fix

/* * transform upper to lower */ char * -down_character(const char *src, int len, int encoding) +down_character(char *src, int len, int encoding) { char *s; char *res; int i; Assert(src != NULL); Assert(len >= 0); - s = (char *) src; + s = src; res = s;

Remove const from the parameter since the function modifies the input in-place. Also update the corresponding header declaration in src/include/oracle_fe_utils/ora_string_utils.h.

🤖 Prompt for AI Agents

In @src/oracle_fe_utils/ora_string_utils.c around lines 174 - 203, The function down_character currently takes a const char *src but casts away const and mutates the buffer; change the signature to char *down_character(char *src, int len, int encoding) (remove const) and eliminate the (char *) cast inside the function so it operates on the provided mutable pointer, and update the matching prototype in ora_string_utils.h; ensure any callers pass mutable buffers (or make copies) so no read-only memory is modified.

coderabbitai · 2026-01-07T09:48:19Z

+/*
+ * transform lower to upper
+*/
+char *
+upper_character(const char *src, int len, int encoding)
+{
+	char	*s;
+	char	*res;
+	int		i;
+	Assert(src != NULL);
+	Assert(len >= 0);
+
+	s = (char *) src;
+	res = s;
+
+	/* transform */
+	for (i = 0; i < len ; i++)
+	{
+		int mblen = pg_encoding_mblen(encoding, s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		*s = toupper(*s);
+		s++;
+	}
+
+	return res;
+}


⚠️ Potential issue | 🔴 Critical

CRITICAL: Same const-correctness violation as down_character.

This function has the identical const-correctness issue as down_character (lines 174-203). The parameter should be char *src instead of const char *src since the function modifies the input in-place.

🐛 Proposed fix

/* * transform lower to upper */ char * -upper_character(const char *src, int len, int encoding) +upper_character(char *src, int len, int encoding) { char *s; char *res; int i; Assert(src != NULL); Assert(len >= 0); - s = (char *) src; + s = src; res = s;

🤖 Prompt for AI Agents

In @src/oracle_fe_utils/ora_string_utils.c around lines 205 - 237, The function upper_character modifies its input buffer in-place but its signature declares src as const char*, causing a const-correctness violation; change the parameter type from const char *src to char *src (matching the fix in down_character) and update any callers if needed to pass a mutable buffer, ensuring pg_encoding_mblen and in-place toupper usage remain valid.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/parser/scansup.c (1)

111-112: Duplicate comment line.

Lines 111 and 112 are identical—appears to be a copy-paste error.

📝 Suggested fix

 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and

🧹 Nitpick comments (3)

src/backend/parser/scansup.c (3)
19-19: Redundant include.

postgres.h already provides boolean types via its internal headers. The <stdbool.h> include is unnecessary in PostgreSQL/IvorySQL code.
♻️ Suggested fix
 #include "postgres.h"
 
 #include <ctype.h>
-#include <stdbool.h>
 
 #include "mb/pg_wchar.h"
225-249: Add input validation and fix formatting.

This function lacks the Assert checks present in downcase_identifier and upcase_identifier. Also, there's a leading space before bool on line 225.
♻️ Suggested fix
-/*
- *  Detemine whether the letters in the string are all lowercase letters
- */
- bool
- identifier_is_all_lower(const char *ident, int len)
+/*
+ *  Determine whether the letters in the string are all lowercase letters
+ */
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
 	const char* s;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;
254-278: Add input validation and fix formatting.

Same issues as identifier_is_all_lower: missing Assert checks and leading space before bool.
♻️ Suggested fix
-/*
- *  Detemine whether the letters in the string are all uppercase letters
- */
- bool
- identifier_is_all_upper(const char *ident, int len)
+/*
+ *  Determine whether the letters in the string are all uppercase letters
+ */
+bool
+identifier_is_all_upper(const char *ident, int len)
 {
 	int i;
 	const char* s;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c54e80 and ce1614c.

📒 Files selected for processing (1)

src/backend/parser/scansup.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/backend/parser/scansup.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: pg_regression (ubuntu-latest)
GitHub Check: oracle_pg_regression (ubuntu-latest)
GitHub Check: build (ubuntu-latest)
GitHub Check: meson_build (ubuntu-latest)
GitHub Check: oracle_regression (ubuntu-latest)
GitHub Check: contrib_regression

🔇 Additional comments (4)

src/backend/parser/scansup.c (4)

53-54: Good defensive assertions.

Adding precondition checks for NULL pointer and non-negative length is a sound practice for API robustness.

68-80: Multi-byte handling looks correct.

Pre-copying with memcpy and then skipping multi-byte sequences ensures non-ASCII characters are preserved while ASCII characters undergo case transformation. The index arithmetic correctly advances past the full multi-byte sequence.

102-103: Consistent assertions with downcase_identifier.

118-130: Multi-byte handling consistent with downcase_identifier.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/bin/initdb/initdb.c:
- Around line 3707-3715: The code currently assigns lowerusername = username
then calls down_character(username, ...) which operates in-place and returns the
same pointer, and then free(lowerusername) causes a use-after-free on username;
remove the free(lowerusername) call (or alternatively change down_character to
allocate a new buffer and document that behavior) so that username continues to
reference valid memory; update the block around down_character/username (and the
temporary lowerusername variable) to either omit the temporary and free or to
ensure ownership semantics are clear (prefer simply removing free(lowerusername)
since down_character is in-place).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce1614c and c35e499.

📒 Files selected for processing (1)

src/bin/initdb/initdb.c

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: build (ubuntu-latest)
GitHub Check: meson_build (ubuntu-latest)
GitHub Check: pg_regression (ubuntu-latest)
GitHub Check: oracle_regression (ubuntu-latest)
GitHub Check: oracle_pg_regression (ubuntu-latest)
GitHub Check: contrib_regression

🔇 Additional comments (3)

src/bin/initdb/initdb.c (3)

85-85: LGTM!

The new header provides the multi-byte encoding-aware string utilities (down_character, is_all_upper) needed for the identifier case handling changes.

1703-1704: LGTM!

The bootstrap command now correctly passes the identifier case switch mode to ensure consistent case handling during database initialization.

3698-3704: Correct initialization ordering for encoding-aware identifier handling.

Moving these initialization calls before the Oracle username transformation ensures that encodingid is properly set by setup_locale_encoding() before it's used by is_all_upper() and down_character(). This is essential for the multi-byte character support.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @src/backend/parser/scansup.c:
- Line 234: Fix the spelling mistake in the comment for the function
identifier_is_all_lower by changing "Detemine" to "Determine"; make the
identical correction in the comment for identifier_is_all_upper as well so both
function comments read "Determine whether the letters in the string are all
lowercase/uppercase letters" respectively, leaving code and punctuation
unchanged.

🧹 Nitpick comments (4)

src/backend/parser/scansup.c (4)
19-19: Remove redundant include.

The postgres.h header (line 16) already provides the bool type and related definitions. Including <stdbool.h> is unnecessary and should be removed per PostgreSQL coding conventions.
♻️ Proposed fix
-#include <stdbool.h>
236-260: Style inconsistencies with PostgreSQL conventions.

Both new functions have minor style issues:

Extra space before function name (lines 236, 265)

Pointer declaration style should be const char *s not const char* s (lines 240, 269)
♻️ Proposed fix for identifier_is_all_lower
-	int i;
-	const char* s;
+	int			i;
+	const char *s;
Apply similar formatting fixes to identifier_is_all_upper.
Also applies to: 265-289

236-260: Add input validation assertions.

Unlike downcase_identifier and upcase_identifier (lines 53-54, 102-103), these new helper functions lack input validation. Consider adding assertions to guard against NULL pointers and negative lengths for consistency.
♻️ Proposed fix
  bool
  identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
 	const char *s;
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;
Apply the same assertions to identifier_is_all_upper.
Also applies to: 265-289

236-260: Consider caching the encoding check for efficiency.

Unlike downcase_identifier and upcase_identifier which cache pg_database_encoding_max_length() == 1 before the loop (lines 57, 106), these new functions call pg_mblen on every iteration regardless of encoding. For single-byte encodings, this adds unnecessary overhead.
⚡ Proposed optimization
  bool
  identifier_is_all_lower(const char *ident, int len)
 {
 	int			i;
 	const char *s;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
 
 	s = ident;
 
 	for (i = 0; i < len; i++)
 	{
-		int mblen = pg_mblen(s);
-
-		if (mblen > 1)
+		if (!enc_is_single_byte)
 		{
-			s += mblen;
-			i += (mblen - 1);
-			continue;
+			int mblen = pg_mblen(s);
+
+			if (mblen > 1)
+			{
+				s += mblen;
+				i += (mblen - 1);
+				continue;
+			}
 		}
 
 		if (isalpha(*s) && isupper(*s))
Apply similar optimization to identifier_is_all_upper.
Also applies to: 265-289

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c35e499 and 1ba7c89.

📒 Files selected for processing (1)

src/backend/parser/scansup.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/sql/ora_dbms_output.sql:211-212
Timestamp: 2025-12-26T07:43:29.003Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c (lines 447-449), buffer_size values below 2000 are silently clamped to DBMS_OUTPUT_MIN_BUFFER_SIZE (2000), matching Oracle behavior. This is not an error condition.

Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c:82-146
Timestamp: 2025-12-03T06:50:27.755Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c, the behavior where ENABLE() clears the existing buffer (rather than preserving it like Oracle does) is an intentional known difference tracked in issue #26 at https://github.com/rophy/IvorySQL/issues/26 (medium impact, open).

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/backend/parser/scansup.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: meson_build (ubuntu-latest)
GitHub Check: contrib_regression
GitHub Check: oracle_pg_regression (ubuntu-latest)
GitHub Check: build (ubuntu-latest)
GitHub Check: oracle_regression (ubuntu-latest)
GitHub Check: pg_regression (ubuntu-latest)

🔇 Additional comments (3)

src/backend/parser/scansup.c (3)

53-54: LGTM: Good defensive programming.

The input validation assertions appropriately guard against NULL pointers and negative lengths, improving code robustness.

Also applies to: 102-103

68-80: LGTM: Multi-byte character handling is correct.

The approach of copying the input first with memcpy and then selectively modifying only ASCII characters (while skipping multi-byte sequences via pg_mblen) correctly preserves multi-byte identifiers.

Also applies to: 118-130

152-160: LGTM: Sensible optimization.

The early-exit logic for identifiers containing '/' or '.' avoids unnecessary case transformations for path-like identifiers, improving efficiency.

coderabbitai · 2026-01-08T06:41:22Z

 }
+
+/*
+ *  Detemine whether the letters in the string are all lowercase letters


⚠️ Potential issue | 🟡 Minor

Fix typo in comments.

"Detemine" should be "Determine" in both function comments.

📝 Proposed fix

/* - * Detemine whether the letters in the string are all lowercase letters + * Determine whether the letters in the string are all lowercase letters */

Apply the same fix to the comment at line 263 for identifier_is_all_upper.

Also applies to: 263-263

🤖 Prompt for AI Agents

In @src/backend/parser/scansup.c at line 234, Fix the spelling mistake in the comment for the function identifier_is_all_lower by changing "Detemine" to "Determine"; make the identical correction in the comment for identifier_is_all_upper as well so both function comments read "Determine whether the letters in the string are all lowercase/uppercase letters" respectively, leaving code and punctuation unchanged.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/bin/initdb/initdb.c (1)
3706-3715: Document the slash check for special username formats.

The encoding-aware username transformation looks correct. However, the new condition !strchr(username, '/') at line 3710 lacks documentation. This check likely prevents transformation of special username formats (e.g., Kerberos principals like user/instance@REALM), but without a comment, its purpose is unclear and could lead to confusion.
📝 Add explanatory comment
 	/* Oracle compatibility: transform uppercase usernames to lowercase. */
 	if (database_mode == DB_ORACLE && username != NULL
 		&& caseswitchmode != NORMAL
 		&& is_all_upper(username, strlen(username), encodingid)
+		/* Skip transformation for special username formats like Kerberos principals */
 		&& !strchr(username, '/'))
 	{
Otherwise, the encoding-aware function calls with encodingid are the correct fix for multi-byte character handling.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba7c89 and 396c2dc.

📒 Files selected for processing (2)

src/backend/parser/scansup.c
src/bin/initdb/initdb.c

🚧 Files skipped from review as they are similar to previous changes (1)

src/backend/parser/scansup.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/bin/initdb/initdb.c

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: oracle_regression (ubuntu-latest)
GitHub Check: oracle_pg_regression (ubuntu-latest)
GitHub Check: meson_build (ubuntu-latest)
GitHub Check: build (ubuntu-latest)
GitHub Check: contrib_regression
GitHub Check: pg_regression (ubuntu-latest)

🔇 Additional comments (3)

src/bin/initdb/initdb.c (3)

85-85: LGTM!

The new include is necessary for the encoding-aware string utilities used in the Oracle username transformation.

1703-1703: LGTM!

The formatting adjustment to the bootstrap command assembly is fine.

3698-3704: LGTM!

Moving the initialization calls earlier in main() ensures encodingid is properly set before the Oracle username transformation logic. This is necessary for the encoding-aware string utilities to function correctly.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/parser/scansup.c (1)

111-112: Duplicated comment line.

Line 112 is a duplicate of line 111. Remove the duplicate.

Proposed fix

 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and

🤖 Fix all issues with AI agents

In @src/backend/parser/scansup.c:
- Around line 262-289: Fix the typo in the comment for identifier_is_all_upper
("Detemine" → "Determine") and add defensive assertions at the start of the
function: Assert(ident != NULL) and Assert(len >= 0); keep these assertions
consistent with identifier_is_all_lower so the function validates its inputs
before processing.
- Around line 233-260: Fix the typo in the function comment ("Detemine" →
"Determine") and add input validation at the start of identifier_is_all_lower by
inserting Assert(ident != NULL) and Assert(len >= 0) (matching the style used in
downcase_identifier/upcase_identifier); keep the rest of the function logic
unchanged and reference identifier_is_all_lower to locate the edit.
- Around line 150-160: Remove the unused local variable `i` and the
commented-out loop block in scansup.c (the /* for (i = 0; ...) */ section) to
eliminate dead code and compiler warnings; if that logic must be preserved for
future work instead, replace the block with a short TODO comment describing the
intended behavior and remove the unused `i` declaration so no unused-variable
remains.

🧹 Nitpick comments (1)

src/backend/parser/scansup.c (1)
236-240: Minor formatting inconsistencies.

Lines 236, 265: Extra leading space before bool return type

Lines 240, 269: Pointer declaration style const char* s should be const char *s per PostgreSQL coding conventions
Proposed fix for identifier_is_all_lower
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
-	const char* s;
+	const char *s;

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d72ef88 and c897903.

📒 Files selected for processing (1)

src/backend/parser/scansup.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/backend/parser/scansup.c

🔇 Additional comments (2)

src/backend/parser/scansup.c (2)

53-54: Good addition of defensive assertions.

These assertions protect against null pointer dereferences and negative length values. This is a welcome improvement.

68-87: Multi-byte handling logic is correct.

The approach of copying the entire input first via memcpy, then selectively modifying only single-byte ASCII characters while skipping multi-byte sequences, correctly preserves non-ASCII characters.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/parser/scansup.c (1)

111-112: Duplicate comment line.

Line 112 is an exact duplicate of line 111. Remove the duplicate.

Suggested fix

 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and

🤖 Fix all issues with AI agents

In @src/backend/parser/scansup.c:
- Around line 265-289: In identifier_is_all_upper, cast characters to unsigned
char before calling isalpha() and islower() (e.g., use (unsigned char)*s) to
avoid undefined behavior; add assertions (Assert(ident != NULL && len >= 0)) at
function start; fix the typo in any comment or message from "Detemine" to
"Determine"; normalize formatting to match project style (no leading space
before function return type, use consistent pointer style like const char
*ident) and keep multibyte-handling logic as-is while ensuring spacing/brace
style matches identifier_is_all_lower.
- Around line 236-260: The function identifier_is_all_lower has undefined
behavior because isalpha() and isupper() are called with plain char; cast the
character to (unsigned char) before passing to isalpha/isupper (as done in
downcase_identifier), add Assert(ident != NULL); Assert(len >= 0); at the start
for consistency, change the signature spacing to "bool
identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL pointer
style "const char *s", and fix the comment typo "Detemine" → "Determine"; keep
the existing multi-byte handling via pg_mblen() and advance s and i as currently
implemented.

🧹 Nitpick comments (2)

src/backend/parser/scansup.c (2)

19-19: Redundant include.

postgres.h already provides the bool type via c.h. This include is unnecessary in backend code.

150-160: Remove or implement commented-out code.

This commented block introduces noise. If the '/' separator logic is needed, uncomment and test it; otherwise, remove it to keep the codebase clean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c897903 and 252ca57.

📒 Files selected for processing (1)

src/backend/parser/scansup.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/backend/parser/scansup.c

🔇 Additional comments (2)

src/backend/parser/scansup.c (2)

53-54: LGTM!

Good defensive assertions to validate preconditions.

68-80: LGTM - Multi-byte handling is correct.

The approach of copying first via memcpy, then selectively transforming single-byte ASCII characters while skipping multi-byte sequences, is sound.

Minor style nit: PostgreSQL convention places a tab between type and variable name (int<tab>mblen).

coderabbitai · 2026-01-13T08:19:03Z

+ bool
+ identifier_is_all_lower(const char *ident, int len)
+{
+	int i;
+	const char* s;
+
+	s = ident;
+
+	for (i = 0; i < len; i++)
+	{
+		int mblen = pg_mblen(s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		if (isalpha(*s) && isupper(*s))
+			return false;
+		s++;
+	}
+	return true;
+}


⚠️ Potential issue | 🟠 Major

Undefined behavior: pass unsigned char to ctype functions.

isalpha() and isupper() have undefined behavior when passed a negative value (which occurs for high-bit-set chars when char is signed). Cast to unsigned char as done in downcase_identifier.

Also:

Line 234: Typo "Detemine" → "Determine"

Line 236: Remove leading space before bool

Line 240: Use PostgreSQL pointer style const char *s

Add Assert(ident != NULL); Assert(len >= 0); for consistency

Suggested fix

/* - * Detemine whether the letters in the string are all lowercase letters + * Determine whether the letters in the string are all lowercase letters */ - bool - identifier_is_all_lower(const char *ident, int len) +bool +identifier_is_all_lower(const char *ident, int len) { - int i; - const char* s; - - s = ident; + int i; + const char *s = ident; + + Assert(ident != NULL); + Assert(len >= 0); for (i = 0; i < len; i++) { + unsigned char ch; int mblen = pg_mblen(s); if (mblen > 1) { s += mblen; i += (mblen - 1); continue; } - if (isalpha(*s) && isupper(*s)) + ch = (unsigned char) *s; + if (isalpha(ch) && isupper(ch)) return false; s++; } return true; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

bool

identifier_is_all_lower(const char *ident, int len)

{

int i;

const char* s;

s = ident;

for (i = 0; i < len; i++)

{

int mblen = pg_mblen(s);

if (mblen > 1)

{

s += mblen;

i += (mblen - 1);

continue;

}

if (isalpha(*s) && isupper(*s))

return false;

s++;

}

return true;

}

/*

* Determine whether the letters in the string are all lowercase letters

*/

bool

identifier_is_all_lower(const char *ident, int len)

{

int i;

const char *s = ident;

Assert(ident != NULL);

Assert(len >= 0);

for (i = 0; i < len; i++)

{

unsigned char ch;

int mblen = pg_mblen(s);

if (mblen > 1)

{

s += mblen;

i += (mblen - 1);

continue;

}

ch = (unsigned char) *s;

if (isalpha(ch) && isupper(ch))

return false;

s++;

}

return true;

}

🤖 Prompt for AI Agents

In @src/backend/parser/scansup.c around lines 236 - 260, The function identifier_is_all_lower has undefined behavior because isalpha() and isupper() are called with plain char; cast the character to (unsigned char) before passing to isalpha/isupper (as done in downcase_identifier), add Assert(ident != NULL); Assert(len >= 0); at the start for consistency, change the signature spacing to "bool identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL pointer style "const char *s", and fix the comment typo "Detemine" → "Determine"; keep the existing multi-byte handling via pg_mblen() and advance s and i as currently implemented.

coderabbitai · 2026-01-13T08:19:03Z

+ bool
+ identifier_is_all_upper(const char *ident, int len)
+{
+	int i;
+	const char* s;
+
+	s = ident;
+
+	for (i = 0; i < len; i++)
+	{
+		int mblen = pg_mblen(s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		if (isalpha(*s) && islower(*s))
+			return false;
+		s++;
+	}
+	return true;
+}


⚠️ Potential issue | 🟠 Major

Same issues as identifier_is_all_lower.

Apply identical fixes:

Cast to unsigned char before calling isalpha()/islower()

Fix typo "Detemine" → "Determine"

Fix formatting (leading space, pointer style)

Add assertions

Suggested fix

/* - * Detemine whether the letters in the string are all uppercase letters + * Determine whether the letters in the string are all uppercase letters */ - bool - identifier_is_all_upper(const char *ident, int len) +bool +identifier_is_all_upper(const char *ident, int len) { - int i; - const char* s; - - s = ident; + int i; + const char *s = ident; + + Assert(ident != NULL); + Assert(len >= 0); for (i = 0; i < len; i++) { + unsigned char ch; int mblen = pg_mblen(s); if (mblen > 1) { s += mblen; i += (mblen - 1); continue; } - if (isalpha(*s) && islower(*s)) + ch = (unsigned char) *s; + if (isalpha(ch) && islower(ch)) return false; s++; } return true; }

🤖 Prompt for AI Agents

In @src/backend/parser/scansup.c around lines 265 - 289, In identifier_is_all_upper, cast characters to unsigned char before calling isalpha() and islower() (e.g., use (unsigned char)*s) to avoid undefined behavior; add assertions (Assert(ident != NULL && len >= 0)) at function start; fix the typo in any comment or message from "Detemine" to "Determine"; normalize formatting to match project style (no leading space before function return type, use consistent pointer style like const char *ident) and keep multibyte-handling logic as-is while ensuring spacing/brace style matches identifier_is_all_lower.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/backend/parser/scansup.c (1)
111-116: Duplicate comment line and incorrect terminology.

Lines 111-112 contain a duplicated sentence: "locale-aware translation. However, there are some locales where this"

Line 116 says "ASCII-only downcasing" but this is the upcase_identifier function - it should say "upcasing".
Suggested fix
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
 	 * 'I').  Our current compromise is to use toupper() for characters with
 	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * character, and use an ASCII-only upcasing for 7-bit characters.
 	 */

🤖 Fix all issues with AI agents

In @src/backend/parser/scansup.c:
- Around line 222-249: Fix the typo in the function comment ("Detemine" →
"Determine") and remove the stray leading space before the function return type;
then add the same defensive assertions used in downcase_identifier and
upcase_identifier to validate inputs (e.g., Assert(ident != NULL) and Assert(len
>= 0)) at the start of identifier_is_all_lower so it rejects NULL ident and
negative len values consistently with those other functions.
- Around line 251-278: Fix the typo in the function comment ("Detemine" ->
"Determine") and remove the stray leading space before the return type so the
function declaration reads "bool identifier_is_all_upper(...)" consistently;
then add the same input validation used in identifier_is_all_lower by asserting
the pointer and length at the top of identifier_is_all_upper (e.g., Assert(ident
!= NULL); Assert(len >= 0);) before using ident and len. Maintain the existing
multibyte handling and loop logic otherwise.

🧹 Nitpick comments (1)

src/backend/parser/scansup.c (1)
19-19: Unnecessary include: <stdbool.h> is already provided by postgres.h.

The postgres.h header (included at line 16) already provides bool, true, and false definitions. This include is redundant and can be removed.
Suggested fix
 #include "postgres.h"
 
 #include <ctype.h>
-#include <stdbool.h>
 
 #include "mb/pg_wchar.h"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 252ca57 and d99b973.

📒 Files selected for processing (2)

src/backend/parser/scansup.c
src/bin/initdb/initdb.c

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

📚 Learning: 2025-10-24T08:19:29.453Z

Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

src/bin/initdb/initdb.c
src/backend/parser/scansup.c

🔇 Additional comments (4)

src/backend/parser/scansup.c (2)

68-87: LGTM - Multi-byte handling in downcase_identifier looks correct.

The approach of copying the entire input first with memcpy, then selectively transforming only single-byte ASCII characters while skipping multi-byte sequences using pg_mblen, is a sound strategy for preserving non-ASCII bytes.

118-137: LGTM - Multi-byte handling in upcase_identifier mirrors downcase_identifier correctly.

The implementation correctly follows the same pattern as downcase_identifier for handling multi-byte characters.

src/bin/initdb/initdb.c (2)

3698-3711: LGTM - Initialization reordering enables encoding-aware username transformation.

The reordering of set_info_version(), setup_data_file_paths(), setup_locale_encoding(), and setup_text_search() before the Oracle username handling is necessary because setup_locale_encoding() sets the encodingid variable, which is now required by is_all_upper() and down_character() for proper multi-byte character handling.

85-85: Verify the ora_string_utils.h header provides the required functions.

The code uses is_all_upper() and down_character() functions from this header. Ensure these functions exist with the expected signatures.

coderabbitai · 2026-01-13T08:57:22Z

+/*
+ *  Detemine whether the letters in the string are all lowercase letters
+ */
+ bool
+ identifier_is_all_lower(const char *ident, int len)
+{
+	int i;
+	const char* s;
+
+	s = ident;
+
+	for (i = 0; i < len; i++)
+	{
+		int mblen = pg_mblen(s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		if (isalpha(*s) && isupper(*s))
+			return false;
+		s++;
+	}
+	return true;
+}


⚠️ Potential issue | 🟡 Minor

Typo in comment and missing input validation.

Line 223: "Detemine" should be "Determine".

Lines 225-226: Formatting issue - there's a leading space before bool on the function definition.

Unlike downcase_identifier and upcase_identifier (which have Assert checks at lines 53-54 and 102-103), this function lacks validation for NULL ident or negative len. For consistency and safety, consider adding similar assertions.

Suggested fix

/* - * Detemine whether the letters in the string are all lowercase letters + * Determine whether the letters in the string are all lowercase letters */ - bool - identifier_is_all_lower(const char *ident, int len) +bool +identifier_is_all_lower(const char *ident, int len) { int i; const char* s; + Assert(ident != NULL); + Assert(len >= 0); s = ident;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/*

* Detemine whether the letters in the string are all lowercase letters

*/

bool

identifier_is_all_lower(const char *ident, int len)

{

int i;

const char* s;

s = ident;

for (i = 0; i < len; i++)

{

int mblen = pg_mblen(s);

if (mblen > 1)

{

s += mblen;

i += (mblen - 1);

continue;

}

if (isalpha(*s) && isupper(*s))

return false;

s++;

}

return true;

}

/*

* Determine whether the letters in the string are all lowercase letters

*/

bool

identifier_is_all_lower(const char *ident, int len)

{

int i;

const char* s;

Assert(ident != NULL);

Assert(len >= 0);

s = ident;

for (i = 0; i < len; i++)

{

int mblen = pg_mblen(s);

if (mblen > 1)

{

s += mblen;

i += (mblen - 1);

continue;

}

if (isalpha(*s) && isupper(*s))

return false;

s++;

}

return true;

}

🤖 Prompt for AI Agents

In @src/backend/parser/scansup.c around lines 222 - 249, Fix the typo in the function comment ("Detemine" → "Determine") and remove the stray leading space before the function return type; then add the same defensive assertions used in downcase_identifier and upcase_identifier to validate inputs (e.g., Assert(ident != NULL) and Assert(len >= 0)) at the start of identifier_is_all_lower so it rejects NULL ident and negative len values consistently with those other functions.

coderabbitai · 2026-01-13T08:57:23Z

+/*
+ *  Detemine whether the letters in the string are all uppercase letters
+ */
+ bool
+ identifier_is_all_upper(const char *ident, int len)
+{
+	int i;
+	const char* s;
+
+	s = ident;
+
+	for (i = 0; i < len; i++)
+	{
+		int mblen = pg_mblen(s);
+
+		if (mblen > 1)
+		{
+			s += mblen;
+			i += (mblen - 1);
+			continue;
+		}
+
+		if (isalpha(*s) && islower(*s))
+			return false;
+		s++;
+	}
+	return true;
+}


⚠️ Potential issue | 🟡 Minor

Typo in comment and missing input validation (same issues as identifier_is_all_lower).

Line 252: "Detemine" should be "Determine".

Lines 254-255: Formatting issue with leading space before bool.

Missing Assert checks for NULL pointer and non-negative length.

Suggested fix

/* - * Detemine whether the letters in the string are all uppercase letters + * Determine whether the letters in the string are all uppercase letters */ - bool - identifier_is_all_upper(const char * ident, int len) +bool +identifier_is_all_upper(const char *ident, int len) { int i; const char* s; + Assert(ident != NULL); + Assert(len >= 0); s = ident;

🤖 Prompt for AI Agents

In @src/backend/parser/scansup.c around lines 251 - 278, Fix the typo in the function comment ("Detemine" -> "Determine") and remove the stray leading space before the return type so the function declaration reads "bool identifier_is_all_upper(...)" consistently; then add the same input validation used in identifier_is_all_lower by asserting the pointer and length at the top of identifier_is_all_upper (e.g., Assert(ident != NULL); Assert(len >= 0);) before using ident and len. Maintain the existing multibyte handling and loop logic otherwise.

NotHimmel added 4 commits January 7, 2026 15:06

Fix identifier case handling for multi-byte characters.

7fbadc4

Update ora_string_utils.c

942babb

Update initdb.c

f377a76

initdb

8c54e80

coderabbitai bot mentioned this pull request Jan 7, 2026

📝 Add docstrings to case-sensitive-indentify #1149

Closed

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Update scansup.c

ce1614c

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

Update initdb.c

c35e499

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

Comment thread src/bin/initdb/initdb.c Outdated

Update scansup.c

1ba7c89

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

fix case-sensitive for path username

396c2dc

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

NotHimmel added 2 commits January 13, 2026 15:21

update initdb.c

d72ef88

update scansup.c

c897903

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

Comment thread src/backend/parser/scansup.c Outdated

update scanup.c

252ca57

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

update initdb.c

d99b973

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

NotHimmel closed this Jan 16, 2026

Conversation

NotHimmel commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NotHimmel commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading