Skip to content

Fix identifier case handling for multi-byte characters.#1148

Closed
NotHimmel wants to merge 12 commits intoIvorySQL:masterfrom
NotHimmel:case-sensitive-indentify
Closed

Fix identifier case handling for multi-byte characters.#1148
NotHimmel wants to merge 12 commits intoIvorySQL:masterfrom
NotHimmel:case-sensitive-indentify

Conversation

@NotHimmel
Copy link
Copy Markdown
Collaborator

@NotHimmel NotHimmel commented Jan 7, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Fixed multi‑byte character handling in identifier case conversions to prevent corruption of non‑ASCII names.
    • Improved Oracle‑compatibility startup normalization so uppercase identifiers and usernames are detected and normalized more reliably.
  • Refactor

    • Consolidated encoding‑aware string case detection and transformation utilities for consistent behavior across components.
  • Chores

    • Removed a legacy identifier transformation component from the default build.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Replaced legacy port-based identifier case utilities with Oracle-aware, multi-byte-safe helpers; updated scanner, lexer, initdb, and backend startup to use new APIs; removed old port implementation and adjusted build and headers.

Changes

Cohort / File(s) Summary
Port removal & build
src/port/identifier_alpha_transfor.c, src/port/Makefile, src/port/meson.build
Removed legacy identifier_alpha_transfor.c and its build inclusion.
Oracle frontend utilities
src/oracle_fe_utils/ora_string_utils.c, src/include/oracle_fe_utils/ora_string_utils.h
Added encoding-aware functions down_character, upper_character, is_all_lower, is_all_upper that preserve multi-byte sequences via pg_encoding_mblen.
Parser scanner support
src/backend/parser/scansup.c, src/include/parser/scansup.h
Made downcase_identifier/upcase_identifier multi-byte-safe; added identifier_is_all_lower and identifier_is_all_upper; added <stdbool.h> and multi-byte traversal.
Oracle scanner tokenization
src/backend/oracle_parser/ora_scan.l
Switched call from is_all_upper(...) to identifier_is_all_upper(...) for delimited-identifier handling.
TCOP startup changes
src/backend/tcop/backend_startup.c
Replaced is_all_upper with identifier_is_all_upper and down_character uses with downcase_identifier(..., false, false); added parser/scansup.h include.
initdb & Oracle username handling
src/bin/initdb/initdb.c
Included oracle_fe_utils/ora_string_utils.h; adapted is_all_upper/down_character usage to include encoding parameter; moved some initialization calls.
Public header cleanup
src/include/port.h, src/include/fe_utils/string_utils.h
Removed old case-handling declarations from port.h; minor whitespace cleanup in string_utils.h.

Sequence Diagram(s)

sequenceDiagram
    participant Lexer as Oracle Lexer (ora_scan.l)
    participant Scanner as scansup helpers
    participant OraUtils as oracle_fe_utils

    Lexer->>Scanner: provide identifier token (ident,len)
    Scanner->>Scanner: call identifier_is_all_upper(ident,len)
    alt identifier is all upper
        Scanner->>OraUtils: call down_character(ident,len,encoding)
        OraUtils-->>Scanner: return lowercased, multi-byte-safe string
        Scanner-->>Lexer: return transformed identifier
    else not all upper
        Scanner-->>Lexer: return original identifier
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested reviewers

  • jiaoshuntian
  • OreoYang

Poem

🐰 I hopped through bytes both wide and small,
I guard each multi-byte so none should fall.
Helpers moved where Oracle eyes can spy,
Case-safe and careful — hop, transform, and try!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the pull request: fixing identifier case handling for multi-byte characters, which is reflected throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 84.62% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 7, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #1149

coderabbitai bot added a commit that referenced this pull request Jan 7, 2026
Docstrings generation was requested by @NotHimmel.

* #1148 (comment)

The following files were modified:

* `src/backend/parser/scansup.c`
* `src/backend/tcop/backend_startup.c`
* `src/bin/initdb/initdb.c`
* `src/oracle_fe_utils/ora_string_utils.c`
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Line 213: Fix the typo in the function comment blocks where "Detemine" is
misspelled: change "Detemine whether the letters in the string are all lowercase
letters" to "Determine whether the letters in the string are all lowercase
letters" in both comment blocks around the functions in
src/backend/parser/scansup.c (the comment at line ~213 and the similar comment
at line ~243).

In @src/backend/tcop/backend_startup.c:
- Line 47: Remove the duplicate include directive for "parser/scansup.h": find
the second occurrence of #include "parser/scansup.h" (the one duplicated at line
47) and delete it so only the original include remains; ensure no other
references rely on a second inclusion and run a quick build to confirm
preprocessing is unaffected.

In @src/bin/initdb/initdb.c:
- Around line 3706-3714: The block that lowercases Oracle usernames has a
use-after-free: remove the save-and-free pattern around username since
down_character modifies the buffer in-place; specifically, in the conditional
that checks database_mode == DB_ORACLE, username != NULL, caseswitchmode !=
NORMAL and is_all_upper(...), call down_character(username, strlen(username),
encodingid) without assigning its return to a new pointer or freeing username
(do not call free on username); this matches down_character's in-place behavior
as implemented in ora_string_utils.c and prevents making username a dangling
pointer.

In @src/oracle_fe_utils/ora_string_utils.c:
- Around line 205-237: The function upper_character modifies its input buffer
in-place but its signature declares src as const char*, causing a
const-correctness violation; change the parameter type from const char *src to
char *src (matching the fix in down_character) and update any callers if needed
to pass a mutable buffer, ensuring pg_encoding_mblen and in-place toupper usage
remain valid.
- Around line 174-203: The function down_character currently takes a const char
*src but casts away const and mutates the buffer; change the signature to char
*down_character(char *src, int len, int encoding) (remove const) and eliminate
the (char *) cast inside the function so it operates on the provided mutable
pointer, and update the matching prototype in ora_string_utils.h; ensure any
callers pass mutable buffers (or make copies) so no read-only memory is
modified.
🧹 Nitpick comments (5)
src/backend/parser/scansup.c (3)

19-19: Remove redundant include.

The <stdbool.h> header is unnecessary since postgres.h already provides the bool type and related definitions.

♻️ Proposed fix
 #include <ctype.h>
-#include <stdbool.h>
 

215-239: Review the multi-byte and case-checking logic.

The function skips multi-byte sequences and only checks single-byte alphabetic characters for lowercase. Consider the following:

  1. Line 219: Inconsistent spacing before the function name declaration (extra space after bool).
  2. Logic: The function returns true for strings with multi-byte characters as long as all single-byte alphabetic characters are lowercase. Confirm this matches the intended semantics for Oracle identifier case handling.
♻️ Style fix for spacing
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)

244-268: Review the multi-byte and case-checking logic.

Similar observations as identifier_is_all_lower:

  1. Line 245: Inconsistent spacing before the function name declaration.
  2. Logic: The function returns true for strings with multi-byte characters as long as all single-byte alphabetic characters are uppercase.
♻️ Style fix for spacing
- bool
- identifier_is_all_upper(const char *ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
src/oracle_fe_utils/ora_string_utils.c (2)

242-267: Document the behavior for strings without alphabetic characters.

The function returns true for strings containing no alphabetic characters (e.g., "123", "___"). While this might be intentional (vacuous truth: all zero alphabetic characters are lowercase), it should be documented to avoid confusion.

Consider adding a comment:

/*
 * Determine whether all alphabetic letters in the string are lowercase.
 * Returns true if there are no alphabetic characters.
 * Multi-byte characters are skipped (treated as non-alphabetic).
 */

272-297: Document the behavior for strings without alphabetic characters.

Same issue as is_all_lower: the function returns true for strings containing no alphabetic characters. Document this behavior with a clear comment:

/*
 * Determine whether all alphabetic letters in the string are uppercase.
 * Returns true if there are no alphabetic characters.
 * Multi-byte characters are skipped (treated as non-alphabetic).
 */
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5acc6bb and 8c54e80.

📒 Files selected for processing (12)
  • src/backend/oracle_parser/ora_scan.l
  • src/backend/parser/scansup.c
  • src/backend/tcop/backend_startup.c
  • src/bin/initdb/initdb.c
  • src/include/fe_utils/string_utils.h
  • src/include/oracle_fe_utils/ora_string_utils.h
  • src/include/parser/scansup.h
  • src/include/port.h
  • src/oracle_fe_utils/ora_string_utils.c
  • src/port/Makefile
  • src/port/identifier_alpha_transfor.c
  • src/port/meson.build
💤 Files with no reviewable changes (5)
  • src/include/port.h
  • src/port/identifier_alpha_transfor.c
  • src/port/meson.build
  • src/include/fe_utils/string_utils.h
  • src/port/Makefile
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/include/parser/scansup.h
  • src/backend/tcop/backend_startup.c
  • src/include/oracle_fe_utils/ora_string_utils.h
  • src/backend/parser/scansup.c
  • src/backend/oracle_parser/ora_scan.l
  • src/oracle_fe_utils/ora_string_utils.c
  • src/bin/initdb/initdb.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: meson_build (ubuntu-latest)
  • GitHub Check: contrib_regression
  • GitHub Check: pg_regression (ubuntu-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: oracle_pg_regression (ubuntu-latest)
  • GitHub Check: oracle_regression (ubuntu-latest)
🔇 Additional comments (12)
src/backend/oracle_parser/ora_scan.l (2)

50-50: LGTM: Minor formatting cleanup.

Removed trailing space after the include directive.


949-970: Unable to verify the review comment due to unavailability of the referenced file and functions in the current repository state. The file src/backend/oracle_parser/ora_scan.l and functions identifier_is_all_upper and is_all_upper could not be found.

src/include/parser/scansup.h (1)

32-33: LGTM: New public API for identifier case validation.

These functions extend the scanner support interface with multi-byte aware identifier case checks, aligning with the PR objective to fix multi-byte character handling.

src/backend/parser/scansup.c (2)

111-120: Verify multi-byte handling preserves character integrity.

Similar to downcase_identifier, ensure that strncpy followed by selective character modification doesn't corrupt multi-byte sequences.


60-75: Multi-byte handling is correctly implemented with proper safeguards in place.

The code properly detects multi-byte sequences using pg_mblen() and skips them during case transformation. The approach is sound: strncpy() copies the entire input unchanged, then the loop safely modifies only single-byte characters while preserving multi-byte boundaries. Encoding awareness is handled via pg_database_encoding_max_length(), and tolower()/toupper() are only applied when both the encoding is single-byte and the character has the high bit set. This same pattern is consistently applied across similar functions in the file (upcase_identifier, and other case transformation routines).

src/backend/tcop/backend_startup.c (2)

838-840: Consistent function migration looks correct.

The same pattern of replacing is_all_upper with identifier_is_all_upper and down_character with downcase_identifier(..., false, false) is applied consistently in the INTERCHANGE case handling blocks for both database_name and user_name.

Also applies to: 860-862, 891-893


806-810: Remove reference to non-existent down_character function.

The function signatures are correct and the code usage is appropriate. However, the review comment incorrectly references down_character as the previous function name. This function does not exist in the codebase. The actual function declarations in src/include/parser/scansup.h confirm:

  • downcase_identifier(const char *ident, int len, bool warn, bool truncate) — correctly used with warn=false, truncate=false
  • identifier_is_all_upper(const char *ident, int len) — correctly called with two arguments

The parameters false, false are appropriate since database names should not be truncated or trigger warnings. Remove the mention of down_character and the claim about "changes from" it, as this function does not exist and the review comment's historical premise is inaccurate.

src/include/oracle_fe_utils/ora_string_utils.h (1)

45-48: LGTM: New encoding-aware string utility functions.

These Oracle-specific utilities now accept an encoding parameter to handle multi-byte characters safely, aligning with the PR objective. The functions replace the previous port-level implementations.

src/bin/initdb/initdb.c (3)

85-85: LGTM: New include added for Oracle string utilities.

The include is necessary for the new is_all_upper and down_character functions used in the username transformation logic.


1703-1703: LGTM: Bootstrap command updated with case switch mode.

The command correctly passes the caseswitchmode parameter to the bootstrap backend using the -C flag with proper integer formatting.


3698-3704: The early initialization order is correct.

All four functions are safe to call at lines 3698-3704. They depend exclusively on variables set during command-line parsing (e.g., lc_ctype, encoding, database_mode), not on any logic between setup_bin_paths() and their current position. Moving setup_locale_encoding() before the Oracle username transformation is necessary and correct—encodingid must be available before is_all_upper() and down_character() use it at lines 3709 and 3712.

src/oracle_fe_utils/ora_string_utils.c (1)

29-29: LGTM: Include added for multi-byte character handling.

The mb/pg_wchar.h include is necessary for pg_encoding_mblen() used in the new case-handling functions.

#include "utils/ps_status.h"
#include "utils/timeout.h"
#include "utils/varlena.h"
#include "parser/scansup.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate include directive.

Line 47 duplicates the include from line 30. This causes redundant preprocessing and should be removed.

🔧 Proposed fix
 #include "utils/varlena.h"
-#include "parser/scansup.h"
 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include "parser/scansup.h"
#include "utils/varlena.h"
🤖 Prompt for AI Agents
In @src/backend/tcop/backend_startup.c at line 47, Remove the duplicate include
directive for "parser/scansup.h": find the second occurrence of #include
"parser/scansup.h" (the one duplicated at line 47) and delete it so only the
original include remains; ensure no other references rely on a second inclusion
and run a quick build to confirm preprocessing is unaffected.

Comment thread src/bin/initdb/initdb.c
Comment on lines +174 to +203
char *
down_character(const char *src, int len, int encoding)
{
char *s;
char *res;
int i;
Assert(src != NULL);
Assert(len >= 0);

s = (char *) src;
res = s;

/* transform */
for (i = 0; i < len ; i++)
{
int mblen = pg_encoding_mblen(encoding, s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

*s = tolower(*s);
s++;
}

return res;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Const-correctness violation and misleading signature.

The function signature declares const char *src but immediately casts it to char * and modifies the input in-place. This is undefined behavior if the caller passes a string literal or read-only memory.

Issues:

  1. Line 183 casts away const: s = (char *) src;
  2. Line 198 modifies the "const" input: *s = tolower(*s);

This violates const-correctness and can lead to crashes or security issues if the input is in read-only memory.

🐛 Proposed fix
 /*
  * transform upper to lower
 */
 char *
-down_character(const char *src, int len, int encoding)
+down_character(char *src, int len, int encoding)
 {
 	char	*s;
 	char	*res;
 	int		i;
 	Assert(src != NULL);
 	Assert(len >= 0);
 
-	s = (char *) src;
+	s = src;
 	res = s;

Remove const from the parameter since the function modifies the input in-place. Also update the corresponding header declaration in src/include/oracle_fe_utils/ora_string_utils.h.

🤖 Prompt for AI Agents
In @src/oracle_fe_utils/ora_string_utils.c around lines 174 - 203, The function
down_character currently takes a const char *src but casts away const and
mutates the buffer; change the signature to char *down_character(char *src, int
len, int encoding) (remove const) and eliminate the (char *) cast inside the
function so it operates on the provided mutable pointer, and update the matching
prototype in ora_string_utils.h; ensure any callers pass mutable buffers (or
make copies) so no read-only memory is modified.

Comment on lines +205 to +237
/*
* transform lower to upper
*/
char *
upper_character(const char *src, int len, int encoding)
{
char *s;
char *res;
int i;
Assert(src != NULL);
Assert(len >= 0);

s = (char *) src;
res = s;

/* transform */
for (i = 0; i < len ; i++)
{
int mblen = pg_encoding_mblen(encoding, s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

*s = toupper(*s);
s++;
}

return res;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Same const-correctness violation as down_character.

This function has the identical const-correctness issue as down_character (lines 174-203). The parameter should be char *src instead of const char *src since the function modifies the input in-place.

🐛 Proposed fix
 /*
  * transform lower to upper
 */
 char *
-upper_character(const char *src, int len, int encoding)
+upper_character(char *src, int len, int encoding)
 {
 	char	*s;
 	char	*res;
 	int		i;
 	Assert(src != NULL);
 	Assert(len >= 0);
 
-	s = (char *) src;
+	s = src;
 	res = s;
🤖 Prompt for AI Agents
In @src/oracle_fe_utils/ora_string_utils.c around lines 205 - 237, The function
upper_character modifies its input buffer in-place but its signature declares
src as const char*, causing a const-correctness violation; change the parameter
type from const char *src to char *src (matching the fix in down_character) and
update any callers if needed to pass a mutable buffer, ensuring
pg_encoding_mblen and in-place toupper usage remain valid.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)

111-112: Duplicate comment line.

Lines 111 and 112 are identical—appears to be a copy-paste error.

📝 Suggested fix
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
🧹 Nitpick comments (3)
src/backend/parser/scansup.c (3)

19-19: Redundant include.

postgres.h already provides boolean types via its internal headers. The <stdbool.h> include is unnecessary in PostgreSQL/IvorySQL code.

♻️ Suggested fix
 #include "postgres.h"
 
 #include <ctype.h>
-#include <stdbool.h>
 
 #include "mb/pg_wchar.h"

225-249: Add input validation and fix formatting.

This function lacks the Assert checks present in downcase_identifier and upcase_identifier. Also, there's a leading space before bool on line 225.

♻️ Suggested fix
-/*
- *  Detemine whether the letters in the string are all lowercase letters
- */
- bool
- identifier_is_all_lower(const char *ident, int len)
+/*
+ *  Determine whether the letters in the string are all lowercase letters
+ */
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
 	const char* s;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;

254-278: Add input validation and fix formatting.

Same issues as identifier_is_all_lower: missing Assert checks and leading space before bool.

♻️ Suggested fix
-/*
- *  Detemine whether the letters in the string are all uppercase letters
- */
- bool
- identifier_is_all_upper(const char *ident, int len)
+/*
+ *  Determine whether the letters in the string are all uppercase letters
+ */
+bool
+identifier_is_all_upper(const char *ident, int len)
 {
 	int i;
 	const char* s;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c54e80 and ce1614c.

📒 Files selected for processing (1)
  • src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/backend/parser/scansup.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pg_regression (ubuntu-latest)
  • GitHub Check: oracle_pg_regression (ubuntu-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: meson_build (ubuntu-latest)
  • GitHub Check: oracle_regression (ubuntu-latest)
  • GitHub Check: contrib_regression
🔇 Additional comments (4)
src/backend/parser/scansup.c (4)

53-54: Good defensive assertions.

Adding precondition checks for NULL pointer and non-negative length is a sound practice for API robustness.


68-80: Multi-byte handling looks correct.

Pre-copying with memcpy and then skipping multi-byte sequences ensures non-ASCII characters are preserved while ASCII characters undergo case transformation. The index arithmetic correctly advances past the full multi-byte sequence.


102-103: Consistent assertions with downcase_identifier.


118-130: Multi-byte handling consistent with downcase_identifier.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/bin/initdb/initdb.c:
- Around line 3707-3715: The code currently assigns lowerusername = username
then calls down_character(username, ...) which operates in-place and returns the
same pointer, and then free(lowerusername) causes a use-after-free on username;
remove the free(lowerusername) call (or alternatively change down_character to
allocate a new buffer and document that behavior) so that username continues to
reference valid memory; update the block around down_character/username (and the
temporary lowerusername variable) to either omit the temporary and free or to
ensure ownership semantics are clear (prefer simply removing free(lowerusername)
since down_character is in-place).
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce1614c and c35e499.

📒 Files selected for processing (1)
  • src/bin/initdb/initdb.c
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: meson_build (ubuntu-latest)
  • GitHub Check: pg_regression (ubuntu-latest)
  • GitHub Check: oracle_regression (ubuntu-latest)
  • GitHub Check: oracle_pg_regression (ubuntu-latest)
  • GitHub Check: contrib_regression
🔇 Additional comments (3)
src/bin/initdb/initdb.c (3)

85-85: LGTM!

The new header provides the multi-byte encoding-aware string utilities (down_character, is_all_upper) needed for the identifier case handling changes.


1703-1704: LGTM!

The bootstrap command now correctly passes the identifier case switch mode to ensure consistent case handling during database initialization.


3698-3704: Correct initialization ordering for encoding-aware identifier handling.

Moving these initialization calls before the Oracle username transformation ensures that encodingid is properly set by setup_locale_encoding() before it's used by is_all_upper() and down_character(). This is essential for the multi-byte character support.

Comment thread src/bin/initdb/initdb.c Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Line 234: Fix the spelling mistake in the comment for the function
identifier_is_all_lower by changing "Detemine" to "Determine"; make the
identical correction in the comment for identifier_is_all_upper as well so both
function comments read "Determine whether the letters in the string are all
lowercase/uppercase letters" respectively, leaving code and punctuation
unchanged.
🧹 Nitpick comments (4)
src/backend/parser/scansup.c (4)

19-19: Remove redundant include.

The postgres.h header (line 16) already provides the bool type and related definitions. Including <stdbool.h> is unnecessary and should be removed per PostgreSQL coding conventions.

♻️ Proposed fix
-#include <stdbool.h>

236-260: Style inconsistencies with PostgreSQL conventions.

Both new functions have minor style issues:

  • Extra space before function name (lines 236, 265)
  • Pointer declaration style should be const char *s not const char* s (lines 240, 269)
♻️ Proposed fix for identifier_is_all_lower
-	int i;
-	const char* s;
+	int			i;
+	const char *s;

Apply similar formatting fixes to identifier_is_all_upper.

Also applies to: 265-289


236-260: Add input validation assertions.

Unlike downcase_identifier and upcase_identifier (lines 53-54, 102-103), these new helper functions lack input validation. Consider adding assertions to guard against NULL pointers and negative lengths for consistency.

♻️ Proposed fix
  bool
  identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
 	const char *s;
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;

Apply the same assertions to identifier_is_all_upper.

Also applies to: 265-289


236-260: Consider caching the encoding check for efficiency.

Unlike downcase_identifier and upcase_identifier which cache pg_database_encoding_max_length() == 1 before the loop (lines 57, 106), these new functions call pg_mblen on every iteration regardless of encoding. For single-byte encodings, this adds unnecessary overhead.

⚡ Proposed optimization
  bool
  identifier_is_all_lower(const char *ident, int len)
 {
 	int			i;
 	const char *s;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
 
 	s = ident;
 
 	for (i = 0; i < len; i++)
 	{
-		int mblen = pg_mblen(s);
-
-		if (mblen > 1)
+		if (!enc_is_single_byte)
 		{
-			s += mblen;
-			i += (mblen - 1);
-			continue;
+			int mblen = pg_mblen(s);
+
+			if (mblen > 1)
+			{
+				s += mblen;
+				i += (mblen - 1);
+				continue;
+			}
 		}
 
 		if (isalpha(*s) && isupper(*s))

Apply similar optimization to identifier_is_all_upper.

Also applies to: 265-289

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c35e499 and 1ba7c89.

📒 Files selected for processing (1)
  • src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/sql/ora_dbms_output.sql:211-212
Timestamp: 2025-12-26T07:43:29.003Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c (lines 447-449), buffer_size values below 2000 are silently clamped to DBMS_OUTPUT_MIN_BUFFER_SIZE (2000), matching Oracle behavior. This is not an error condition.
Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c:82-146
Timestamp: 2025-12-03T06:50:27.755Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c, the behavior where ENABLE() clears the existing buffer (rather than preserving it like Oracle does) is an intentional known difference tracked in issue #26 at https://github.com/rophy/IvorySQL/issues/26 (medium impact, open).
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/backend/parser/scansup.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: meson_build (ubuntu-latest)
  • GitHub Check: contrib_regression
  • GitHub Check: oracle_pg_regression (ubuntu-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: oracle_regression (ubuntu-latest)
  • GitHub Check: pg_regression (ubuntu-latest)
🔇 Additional comments (3)
src/backend/parser/scansup.c (3)

53-54: LGTM: Good defensive programming.

The input validation assertions appropriately guard against NULL pointers and negative lengths, improving code robustness.

Also applies to: 102-103


68-80: LGTM: Multi-byte character handling is correct.

The approach of copying the input first with memcpy and then selectively modifying only ASCII characters (while skipping multi-byte sequences via pg_mblen) correctly preserves multi-byte identifiers.

Also applies to: 118-130


152-160: LGTM: Sensible optimization.

The early-exit logic for identifiers containing '/' or '.' avoids unnecessary case transformations for path-like identifiers, improving efficiency.

}

/*
* Detemine whether the letters in the string are all lowercase letters
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in comments.

"Detemine" should be "Determine" in both function comments.

📝 Proposed fix
 /*
- *  Detemine whether the letters in the string are all lowercase letters
+ *  Determine whether the letters in the string are all lowercase letters
  */

Apply the same fix to the comment at line 263 for identifier_is_all_upper.

Also applies to: 263-263

🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c at line 234, Fix the spelling mistake in the
comment for the function identifier_is_all_lower by changing "Detemine" to
"Determine"; make the identical correction in the comment for
identifier_is_all_upper as well so both function comments read "Determine
whether the letters in the string are all lowercase/uppercase letters"
respectively, leaving code and punctuation unchanged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/bin/initdb/initdb.c (1)

3706-3715: Document the slash check for special username formats.

The encoding-aware username transformation looks correct. However, the new condition !strchr(username, '/') at line 3710 lacks documentation. This check likely prevents transformation of special username formats (e.g., Kerberos principals like user/instance@REALM), but without a comment, its purpose is unclear and could lead to confusion.

📝 Add explanatory comment
 	/* Oracle compatibility: transform uppercase usernames to lowercase. */
 	if (database_mode == DB_ORACLE && username != NULL
 		&& caseswitchmode != NORMAL
 		&& is_all_upper(username, strlen(username), encodingid)
+		/* Skip transformation for special username formats like Kerberos principals */
 		&& !strchr(username, '/'))
 	{

Otherwise, the encoding-aware function calls with encodingid are the correct fix for multi-byte character handling.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba7c89 and 396c2dc.

📒 Files selected for processing (2)
  • src/backend/parser/scansup.c
  • src/bin/initdb/initdb.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/bin/initdb/initdb.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: oracle_regression (ubuntu-latest)
  • GitHub Check: oracle_pg_regression (ubuntu-latest)
  • GitHub Check: meson_build (ubuntu-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: contrib_regression
  • GitHub Check: pg_regression (ubuntu-latest)
🔇 Additional comments (3)
src/bin/initdb/initdb.c (3)

85-85: LGTM!

The new include is necessary for the encoding-aware string utilities used in the Oracle username transformation.


1703-1703: LGTM!

The formatting adjustment to the bootstrap command assembly is fine.


3698-3704: LGTM!

Moving the initialization calls earlier in main() ensures encodingid is properly set before the Oracle username transformation logic. This is necessary for the encoding-aware string utilities to function correctly.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)

111-112: Duplicated comment line.

Line 112 is a duplicate of line 111. Remove the duplicate.

Proposed fix
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 262-289: Fix the typo in the comment for identifier_is_all_upper
("Detemine" → "Determine") and add defensive assertions at the start of the
function: Assert(ident != NULL) and Assert(len >= 0); keep these assertions
consistent with identifier_is_all_lower so the function validates its inputs
before processing.
- Around line 233-260: Fix the typo in the function comment ("Detemine" →
"Determine") and add input validation at the start of identifier_is_all_lower by
inserting Assert(ident != NULL) and Assert(len >= 0) (matching the style used in
downcase_identifier/upcase_identifier); keep the rest of the function logic
unchanged and reference identifier_is_all_lower to locate the edit.
- Around line 150-160: Remove the unused local variable `i` and the
commented-out loop block in scansup.c (the /* for (i = 0; ...) */ section) to
eliminate dead code and compiler warnings; if that logic must be preserved for
future work instead, replace the block with a short TODO comment describing the
intended behavior and remove the unused `i` declaration so no unused-variable
remains.
🧹 Nitpick comments (1)
src/backend/parser/scansup.c (1)

236-240: Minor formatting inconsistencies.

  1. Lines 236, 265: Extra leading space before bool return type
  2. Lines 240, 269: Pointer declaration style const char* s should be const char *s per PostgreSQL coding conventions
Proposed fix for identifier_is_all_lower
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
-	const char* s;
+	const char *s;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d72ef88 and c897903.

📒 Files selected for processing (1)
  • src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/backend/parser/scansup.c
🔇 Additional comments (2)
src/backend/parser/scansup.c (2)

53-54: Good addition of defensive assertions.

These assertions protect against null pointer dereferences and negative length values. This is a welcome improvement.


68-87: Multi-byte handling logic is correct.

The approach of copying the entire input first via memcpy, then selectively modifying only single-byte ASCII characters while skipping multi-byte sequences, correctly preserves non-ASCII characters.

Comment thread src/backend/parser/scansup.c Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)

111-112: Duplicate comment line.

Line 112 is an exact duplicate of line 111. Remove the duplicate.

Suggested fix
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 265-289: In identifier_is_all_upper, cast characters to unsigned
char before calling isalpha() and islower() (e.g., use (unsigned char)*s) to
avoid undefined behavior; add assertions (Assert(ident != NULL && len >= 0)) at
function start; fix the typo in any comment or message from "Detemine" to
"Determine"; normalize formatting to match project style (no leading space
before function return type, use consistent pointer style like const char
*ident) and keep multibyte-handling logic as-is while ensuring spacing/brace
style matches identifier_is_all_lower.
- Around line 236-260: The function identifier_is_all_lower has undefined
behavior because isalpha() and isupper() are called with plain char; cast the
character to (unsigned char) before passing to isalpha/isupper (as done in
downcase_identifier), add Assert(ident != NULL); Assert(len >= 0); at the start
for consistency, change the signature spacing to "bool
identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL pointer
style "const char *s", and fix the comment typo "Detemine" → "Determine"; keep
the existing multi-byte handling via pg_mblen() and advance s and i as currently
implemented.
🧹 Nitpick comments (2)
src/backend/parser/scansup.c (2)

19-19: Redundant include.

postgres.h already provides the bool type via c.h. This include is unnecessary in backend code.


150-160: Remove or implement commented-out code.

This commented block introduces noise. If the '/' separator logic is needed, uncomment and test it; otherwise, remove it to keep the codebase clean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c897903 and 252ca57.

📒 Files selected for processing (1)
  • src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/backend/parser/scansup.c
🔇 Additional comments (2)
src/backend/parser/scansup.c (2)

53-54: LGTM!

Good defensive assertions to validate preconditions.


68-80: LGTM - Multi-byte handling is correct.

The approach of copying first via memcpy, then selectively transforming single-byte ASCII characters while skipping multi-byte sequences, is sound.

Minor style nit: PostgreSQL convention places a tab between type and variable name (int<tab>mblen).

Comment on lines +236 to +260
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;

s = ident;

for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

if (isalpha(*s) && isupper(*s))
return false;
s++;
}
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Undefined behavior: pass unsigned char to ctype functions.

isalpha() and isupper() have undefined behavior when passed a negative value (which occurs for high-bit-set chars when char is signed). Cast to unsigned char as done in downcase_identifier.

Also:

  • Line 234: Typo "Detemine" → "Determine"
  • Line 236: Remove leading space before bool
  • Line 240: Use PostgreSQL pointer style const char *s
  • Add Assert(ident != NULL); Assert(len >= 0); for consistency
Suggested fix
 /*
- *  Detemine whether the letters in the string are all lowercase letters
+ *  Determine whether the letters in the string are all lowercase letters
  */
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
-	int i;
-	const char* s;
-
-	s = ident;
+	int			i;
+	const char *s = ident;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	for (i = 0; i < len; i++)
 	{
+		unsigned char ch;
 		int mblen = pg_mblen(s);
 
 		if (mblen > 1)
 		{
 			s += mblen;
 			i += (mblen - 1);
 			continue;
 		}
 
-		if (isalpha(*s) && isupper(*s))
+		ch = (unsigned char) *s;
+		if (isalpha(ch) && isupper(ch))
 			return false;
 		s++;
 	}
 	return true;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;
s = ident;
for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
if (isalpha(*s) && isupper(*s))
return false;
s++;
}
return true;
}
/*
* Determine whether the letters in the string are all lowercase letters
*/
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char *s = ident;
Assert(ident != NULL);
Assert(len >= 0);
for (i = 0; i < len; i++)
{
unsigned char ch;
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
ch = (unsigned char) *s;
if (isalpha(ch) && isupper(ch))
return false;
s++;
}
return true;
}
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 236 - 260, The function
identifier_is_all_lower has undefined behavior because isalpha() and isupper()
are called with plain char; cast the character to (unsigned char) before passing
to isalpha/isupper (as done in downcase_identifier), add Assert(ident != NULL);
Assert(len >= 0); at the start for consistency, change the signature spacing to
"bool identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL
pointer style "const char *s", and fix the comment typo "Detemine" →
"Determine"; keep the existing multi-byte handling via pg_mblen() and advance s
and i as currently implemented.

Comment on lines +265 to +289
bool
identifier_is_all_upper(const char *ident, int len)
{
int i;
const char* s;

s = ident;

for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

if (isalpha(*s) && islower(*s))
return false;
s++;
}
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same issues as identifier_is_all_lower.

Apply identical fixes:

  • Cast to unsigned char before calling isalpha()/islower()
  • Fix typo "Detemine" → "Determine"
  • Fix formatting (leading space, pointer style)
  • Add assertions
Suggested fix
 /*
- *  Detemine whether the letters in the string are all uppercase letters
+ *  Determine whether the letters in the string are all uppercase letters
  */
- bool
- identifier_is_all_upper(const char *ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
 {
-	int i;
-	const char* s;
-
-	s = ident;
+	int			i;
+	const char *s = ident;
+
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	for (i = 0; i < len; i++)
 	{
+		unsigned char ch;
 		int mblen = pg_mblen(s);
 
 		if (mblen > 1)
 		{
 			s += mblen;
 			i += (mblen - 1);
 			continue;
 		}
 
-		if (isalpha(*s) && islower(*s))
+		ch = (unsigned char) *s;
+		if (isalpha(ch) && islower(ch))
 			return false;
 		s++;
 	}
 	return true;
 }
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 265 - 289, In
identifier_is_all_upper, cast characters to unsigned char before calling
isalpha() and islower() (e.g., use (unsigned char)*s) to avoid undefined
behavior; add assertions (Assert(ident != NULL && len >= 0)) at function start;
fix the typo in any comment or message from "Detemine" to "Determine"; normalize
formatting to match project style (no leading space before function return type,
use consistent pointer style like const char *ident) and keep multibyte-handling
logic as-is while ensuring spacing/brace style matches identifier_is_all_lower.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)

111-116: Duplicate comment line and incorrect terminology.

  1. Lines 111-112 contain a duplicated sentence: "locale-aware translation. However, there are some locales where this"
  2. Line 116 says "ASCII-only downcasing" but this is the upcase_identifier function - it should say "upcasing".
Suggested fix
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 	 * have the infrastructure for.  Instead we use toupper() to provide a
-	 * locale-aware translation.  However, there are some locales where this
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
 	 * 'I').  Our current compromise is to use toupper() for characters with
 	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * character, and use an ASCII-only upcasing for 7-bit characters.
 	 */
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 222-249: Fix the typo in the function comment ("Detemine" →
"Determine") and remove the stray leading space before the function return type;
then add the same defensive assertions used in downcase_identifier and
upcase_identifier to validate inputs (e.g., Assert(ident != NULL) and Assert(len
>= 0)) at the start of identifier_is_all_lower so it rejects NULL ident and
negative len values consistently with those other functions.
- Around line 251-278: Fix the typo in the function comment ("Detemine" ->
"Determine") and remove the stray leading space before the return type so the
function declaration reads "bool identifier_is_all_upper(...)" consistently;
then add the same input validation used in identifier_is_all_lower by asserting
the pointer and length at the top of identifier_is_all_upper (e.g., Assert(ident
!= NULL); Assert(len >= 0);) before using ident and len. Maintain the existing
multibyte handling and loop logic otherwise.
🧹 Nitpick comments (1)
src/backend/parser/scansup.c (1)

19-19: Unnecessary include: <stdbool.h> is already provided by postgres.h.

The postgres.h header (included at line 16) already provides bool, true, and false definitions. This include is redundant and can be removed.

Suggested fix
 #include "postgres.h"
 
 #include <ctype.h>
-#include <stdbool.h>
 
 #include "mb/pg_wchar.h"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 252ca57 and d99b973.

📒 Files selected for processing (2)
  • src/backend/parser/scansup.c
  • src/bin/initdb/initdb.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);

Applied to files:

  • src/bin/initdb/initdb.c
  • src/backend/parser/scansup.c
🔇 Additional comments (4)
src/backend/parser/scansup.c (2)

68-87: LGTM - Multi-byte handling in downcase_identifier looks correct.

The approach of copying the entire input first with memcpy, then selectively transforming only single-byte ASCII characters while skipping multi-byte sequences using pg_mblen, is a sound strategy for preserving non-ASCII bytes.


118-137: LGTM - Multi-byte handling in upcase_identifier mirrors downcase_identifier correctly.

The implementation correctly follows the same pattern as downcase_identifier for handling multi-byte characters.

src/bin/initdb/initdb.c (2)

3698-3711: LGTM - Initialization reordering enables encoding-aware username transformation.

The reordering of set_info_version(), setup_data_file_paths(), setup_locale_encoding(), and setup_text_search() before the Oracle username handling is necessary because setup_locale_encoding() sets the encodingid variable, which is now required by is_all_upper() and down_character() for proper multi-byte character handling.


85-85: Verify the ora_string_utils.h header provides the required functions.

The code uses is_all_upper() and down_character() functions from this header. Ensure these functions exist with the expected signatures.

Comment on lines +222 to +249
/*
* Detemine whether the letters in the string are all lowercase letters
*/
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;

s = ident;

for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

if (isalpha(*s) && isupper(*s))
return false;
s++;
}
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in comment and missing input validation.

  1. Line 223: "Detemine" should be "Determine".
  2. Lines 225-226: Formatting issue - there's a leading space before bool on the function definition.
  3. Unlike downcase_identifier and upcase_identifier (which have Assert checks at lines 53-54 and 102-103), this function lacks validation for NULL ident or negative len. For consistency and safety, consider adding similar assertions.
Suggested fix
 /*
- *  Detemine whether the letters in the string are all lowercase letters
+ *  Determine whether the letters in the string are all lowercase letters
  */
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
 {
 	int i;
 	const char* s;
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/*
* Detemine whether the letters in the string are all lowercase letters
*/
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;
s = ident;
for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
if (isalpha(*s) && isupper(*s))
return false;
s++;
}
return true;
}
/*
* Determine whether the letters in the string are all lowercase letters
*/
bool
identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;
Assert(ident != NULL);
Assert(len >= 0);
s = ident;
for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
if (isalpha(*s) && isupper(*s))
return false;
s++;
}
return true;
}
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 222 - 249, Fix the typo in the
function comment ("Detemine" → "Determine") and remove the stray leading space
before the function return type; then add the same defensive assertions used in
downcase_identifier and upcase_identifier to validate inputs (e.g., Assert(ident
!= NULL) and Assert(len >= 0)) at the start of identifier_is_all_lower so it
rejects NULL ident and negative len values consistently with those other
functions.

Comment on lines +251 to +278
/*
* Detemine whether the letters in the string are all uppercase letters
*/
bool
identifier_is_all_upper(const char *ident, int len)
{
int i;
const char* s;

s = ident;

for (i = 0; i < len; i++)
{
int mblen = pg_mblen(s);

if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}

if (isalpha(*s) && islower(*s))
return false;
s++;
}
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in comment and missing input validation (same issues as identifier_is_all_lower).

  1. Line 252: "Detemine" should be "Determine".
  2. Lines 254-255: Formatting issue with leading space before bool.
  3. Missing Assert checks for NULL pointer and non-negative length.
Suggested fix
 /*
- *  Detemine whether the letters in the string are all uppercase letters
+ *  Determine whether the letters in the string are all uppercase letters
  */
- bool
- identifier_is_all_upper(const char * ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
 {
 	int i;
 	const char* s;
+	Assert(ident != NULL);
+	Assert(len >= 0);
 
 	s = ident;
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 251 - 278, Fix the typo in the
function comment ("Detemine" -> "Determine") and remove the stray leading space
before the return type so the function declaration reads "bool
identifier_is_all_upper(...)" consistently; then add the same input validation
used in identifier_is_all_lower by asserting the pointer and length at the top
of identifier_is_all_upper (e.g., Assert(ident != NULL); Assert(len >= 0);)
before using ident and len. Maintain the existing multibyte handling and loop
logic otherwise.

@NotHimmel NotHimmel closed this Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant