Fix identifier case handling for multi-byte characters.#1148
Fix identifier case handling for multi-byte characters.#1148NotHimmel wants to merge 12 commits intoIvorySQL:masterfrom
Conversation
📝 WalkthroughWalkthroughReplaced legacy port-based identifier case utilities with Oracle-aware, multi-byte-safe helpers; updated scanner, lexer, initdb, and backend startup to use new APIs; removed old port implementation and adjusted build and headers. Changes
Sequence Diagram(s)sequenceDiagram
participant Lexer as Oracle Lexer (ora_scan.l)
participant Scanner as scansup helpers
participant OraUtils as oracle_fe_utils
Lexer->>Scanner: provide identifier token (ident,len)
Scanner->>Scanner: call identifier_is_all_upper(ident,len)
alt identifier is all upper
Scanner->>OraUtils: call down_character(ident,len,encoding)
OraUtils-->>Scanner: return lowercased, multi-byte-safe string
Scanner-->>Lexer: return transformed identifier
else not all upper
Scanner-->>Lexer: return original identifier
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @NotHimmel. * #1148 (comment) The following files were modified: * `src/backend/parser/scansup.c` * `src/backend/tcop/backend_startup.c` * `src/bin/initdb/initdb.c` * `src/oracle_fe_utils/ora_string_utils.c`
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Line 213: Fix the typo in the function comment blocks where "Detemine" is
misspelled: change "Detemine whether the letters in the string are all lowercase
letters" to "Determine whether the letters in the string are all lowercase
letters" in both comment blocks around the functions in
src/backend/parser/scansup.c (the comment at line ~213 and the similar comment
at line ~243).
In @src/backend/tcop/backend_startup.c:
- Line 47: Remove the duplicate include directive for "parser/scansup.h": find
the second occurrence of #include "parser/scansup.h" (the one duplicated at line
47) and delete it so only the original include remains; ensure no other
references rely on a second inclusion and run a quick build to confirm
preprocessing is unaffected.
In @src/bin/initdb/initdb.c:
- Around line 3706-3714: The block that lowercases Oracle usernames has a
use-after-free: remove the save-and-free pattern around username since
down_character modifies the buffer in-place; specifically, in the conditional
that checks database_mode == DB_ORACLE, username != NULL, caseswitchmode !=
NORMAL and is_all_upper(...), call down_character(username, strlen(username),
encodingid) without assigning its return to a new pointer or freeing username
(do not call free on username); this matches down_character's in-place behavior
as implemented in ora_string_utils.c and prevents making username a dangling
pointer.
In @src/oracle_fe_utils/ora_string_utils.c:
- Around line 205-237: The function upper_character modifies its input buffer
in-place but its signature declares src as const char*, causing a
const-correctness violation; change the parameter type from const char *src to
char *src (matching the fix in down_character) and update any callers if needed
to pass a mutable buffer, ensuring pg_encoding_mblen and in-place toupper usage
remain valid.
- Around line 174-203: The function down_character currently takes a const char
*src but casts away const and mutates the buffer; change the signature to char
*down_character(char *src, int len, int encoding) (remove const) and eliminate
the (char *) cast inside the function so it operates on the provided mutable
pointer, and update the matching prototype in ora_string_utils.h; ensure any
callers pass mutable buffers (or make copies) so no read-only memory is
modified.
🧹 Nitpick comments (5)
src/backend/parser/scansup.c (3)
19-19: Remove redundant include.The
<stdbool.h>header is unnecessary sincepostgres.halready provides thebooltype and related definitions.♻️ Proposed fix
#include <ctype.h> -#include <stdbool.h>
215-239: Review the multi-byte and case-checking logic.The function skips multi-byte sequences and only checks single-byte alphabetic characters for lowercase. Consider the following:
- Line 219: Inconsistent spacing before the function name declaration (extra space after
bool).- Logic: The function returns
truefor strings with multi-byte characters as long as all single-byte alphabetic characters are lowercase. Confirm this matches the intended semantics for Oracle identifier case handling.♻️ Style fix for spacing
- bool - identifier_is_all_lower(const char *ident, int len) +bool +identifier_is_all_lower(const char *ident, int len)
244-268: Review the multi-byte and case-checking logic.Similar observations as
identifier_is_all_lower:
- Line 245: Inconsistent spacing before the function name declaration.
- Logic: The function returns
truefor strings with multi-byte characters as long as all single-byte alphabetic characters are uppercase.♻️ Style fix for spacing
- bool - identifier_is_all_upper(const char *ident, int len) +bool +identifier_is_all_upper(const char *ident, int len)src/oracle_fe_utils/ora_string_utils.c (2)
242-267: Document the behavior for strings without alphabetic characters.The function returns
truefor strings containing no alphabetic characters (e.g., "123", "___"). While this might be intentional (vacuous truth: all zero alphabetic characters are lowercase), it should be documented to avoid confusion.Consider adding a comment:
/* * Determine whether all alphabetic letters in the string are lowercase. * Returns true if there are no alphabetic characters. * Multi-byte characters are skipped (treated as non-alphabetic). */
272-297: Document the behavior for strings without alphabetic characters.Same issue as
is_all_lower: the function returnstruefor strings containing no alphabetic characters. Document this behavior with a clear comment:/* * Determine whether all alphabetic letters in the string are uppercase. * Returns true if there are no alphabetic characters. * Multi-byte characters are skipped (treated as non-alphabetic). */
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/backend/oracle_parser/ora_scan.lsrc/backend/parser/scansup.csrc/backend/tcop/backend_startup.csrc/bin/initdb/initdb.csrc/include/fe_utils/string_utils.hsrc/include/oracle_fe_utils/ora_string_utils.hsrc/include/parser/scansup.hsrc/include/port.hsrc/oracle_fe_utils/ora_string_utils.csrc/port/Makefilesrc/port/identifier_alpha_transfor.csrc/port/meson.build
💤 Files with no reviewable changes (5)
- src/include/port.h
- src/port/identifier_alpha_transfor.c
- src/port/meson.build
- src/include/fe_utils/string_utils.h
- src/port/Makefile
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/include/parser/scansup.hsrc/backend/tcop/backend_startup.csrc/include/oracle_fe_utils/ora_string_utils.hsrc/backend/parser/scansup.csrc/backend/oracle_parser/ora_scan.lsrc/oracle_fe_utils/ora_string_utils.csrc/bin/initdb/initdb.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: meson_build (ubuntu-latest)
- GitHub Check: contrib_regression
- GitHub Check: pg_regression (ubuntu-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: oracle_pg_regression (ubuntu-latest)
- GitHub Check: oracle_regression (ubuntu-latest)
🔇 Additional comments (12)
src/backend/oracle_parser/ora_scan.l (2)
50-50: LGTM: Minor formatting cleanup.Removed trailing space after the include directive.
949-970: Unable to verify the review comment due to unavailability of the referenced file and functions in the current repository state. The filesrc/backend/oracle_parser/ora_scan.land functionsidentifier_is_all_upperandis_all_uppercould not be found.src/include/parser/scansup.h (1)
32-33: LGTM: New public API for identifier case validation.These functions extend the scanner support interface with multi-byte aware identifier case checks, aligning with the PR objective to fix multi-byte character handling.
src/backend/parser/scansup.c (2)
111-120: Verify multi-byte handling preserves character integrity.Similar to
downcase_identifier, ensure thatstrncpyfollowed by selective character modification doesn't corrupt multi-byte sequences.
60-75: Multi-byte handling is correctly implemented with proper safeguards in place.The code properly detects multi-byte sequences using
pg_mblen()and skips them during case transformation. The approach is sound:strncpy()copies the entire input unchanged, then the loop safely modifies only single-byte characters while preserving multi-byte boundaries. Encoding awareness is handled viapg_database_encoding_max_length(), andtolower()/toupper()are only applied when both the encoding is single-byte and the character has the high bit set. This same pattern is consistently applied across similar functions in the file (upcase_identifier, and other case transformation routines).src/backend/tcop/backend_startup.c (2)
838-840: Consistent function migration looks correct.The same pattern of replacing
is_all_upperwithidentifier_is_all_upperanddown_characterwithdowncase_identifier(..., false, false)is applied consistently in the INTERCHANGE case handling blocks for both database_name and user_name.Also applies to: 860-862, 891-893
806-810: Remove reference to non-existentdown_characterfunction.The function signatures are correct and the code usage is appropriate. However, the review comment incorrectly references
down_characteras the previous function name. This function does not exist in the codebase. The actual function declarations insrc/include/parser/scansup.hconfirm:
downcase_identifier(const char *ident, int len, bool warn, bool truncate)— correctly used withwarn=false, truncate=falseidentifier_is_all_upper(const char *ident, int len)— correctly called with two argumentsThe parameters
false, falseare appropriate since database names should not be truncated or trigger warnings. Remove the mention ofdown_characterand the claim about "changes from" it, as this function does not exist and the review comment's historical premise is inaccurate.src/include/oracle_fe_utils/ora_string_utils.h (1)
45-48: LGTM: New encoding-aware string utility functions.These Oracle-specific utilities now accept an encoding parameter to handle multi-byte characters safely, aligning with the PR objective. The functions replace the previous port-level implementations.
src/bin/initdb/initdb.c (3)
85-85: LGTM: New include added for Oracle string utilities.The include is necessary for the new
is_all_upperanddown_characterfunctions used in the username transformation logic.
1703-1703: LGTM: Bootstrap command updated with case switch mode.The command correctly passes the
caseswitchmodeparameter to the bootstrap backend using the-Cflag with proper integer formatting.
3698-3704: The early initialization order is correct.All four functions are safe to call at lines 3698-3704. They depend exclusively on variables set during command-line parsing (e.g.,
lc_ctype,encoding,database_mode), not on any logic betweensetup_bin_paths()and their current position. Movingsetup_locale_encoding()before the Oracle username transformation is necessary and correct—encodingidmust be available beforeis_all_upper()anddown_character()use it at lines 3709 and 3712.src/oracle_fe_utils/ora_string_utils.c (1)
29-29: LGTM: Include added for multi-byte character handling.The
mb/pg_wchar.hinclude is necessary forpg_encoding_mblen()used in the new case-handling functions.
| #include "utils/ps_status.h" | ||
| #include "utils/timeout.h" | ||
| #include "utils/varlena.h" | ||
| #include "parser/scansup.h" |
There was a problem hiding this comment.
Remove duplicate include directive.
Line 47 duplicates the include from line 30. This causes redundant preprocessing and should be removed.
🔧 Proposed fix
#include "utils/varlena.h"
-#include "parser/scansup.h"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include "parser/scansup.h" | |
| #include "utils/varlena.h" | |
🤖 Prompt for AI Agents
In @src/backend/tcop/backend_startup.c at line 47, Remove the duplicate include
directive for "parser/scansup.h": find the second occurrence of #include
"parser/scansup.h" (the one duplicated at line 47) and delete it so only the
original include remains; ensure no other references rely on a second inclusion
and run a quick build to confirm preprocessing is unaffected.
| char * | ||
| down_character(const char *src, int len, int encoding) | ||
| { | ||
| char *s; | ||
| char *res; | ||
| int i; | ||
| Assert(src != NULL); | ||
| Assert(len >= 0); | ||
|
|
||
| s = (char *) src; | ||
| res = s; | ||
|
|
||
| /* transform */ | ||
| for (i = 0; i < len ; i++) | ||
| { | ||
| int mblen = pg_encoding_mblen(encoding, s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| *s = tolower(*s); | ||
| s++; | ||
| } | ||
|
|
||
| return res; | ||
| } |
There was a problem hiding this comment.
CRITICAL: Const-correctness violation and misleading signature.
The function signature declares const char *src but immediately casts it to char * and modifies the input in-place. This is undefined behavior if the caller passes a string literal or read-only memory.
Issues:
- Line 183 casts away const:
s = (char *) src; - Line 198 modifies the "const" input:
*s = tolower(*s);
This violates const-correctness and can lead to crashes or security issues if the input is in read-only memory.
🐛 Proposed fix
/*
* transform upper to lower
*/
char *
-down_character(const char *src, int len, int encoding)
+down_character(char *src, int len, int encoding)
{
char *s;
char *res;
int i;
Assert(src != NULL);
Assert(len >= 0);
- s = (char *) src;
+ s = src;
res = s;Remove const from the parameter since the function modifies the input in-place. Also update the corresponding header declaration in src/include/oracle_fe_utils/ora_string_utils.h.
🤖 Prompt for AI Agents
In @src/oracle_fe_utils/ora_string_utils.c around lines 174 - 203, The function
down_character currently takes a const char *src but casts away const and
mutates the buffer; change the signature to char *down_character(char *src, int
len, int encoding) (remove const) and eliminate the (char *) cast inside the
function so it operates on the provided mutable pointer, and update the matching
prototype in ora_string_utils.h; ensure any callers pass mutable buffers (or
make copies) so no read-only memory is modified.
| /* | ||
| * transform lower to upper | ||
| */ | ||
| char * | ||
| upper_character(const char *src, int len, int encoding) | ||
| { | ||
| char *s; | ||
| char *res; | ||
| int i; | ||
| Assert(src != NULL); | ||
| Assert(len >= 0); | ||
|
|
||
| s = (char *) src; | ||
| res = s; | ||
|
|
||
| /* transform */ | ||
| for (i = 0; i < len ; i++) | ||
| { | ||
| int mblen = pg_encoding_mblen(encoding, s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| *s = toupper(*s); | ||
| s++; | ||
| } | ||
|
|
||
| return res; | ||
| } |
There was a problem hiding this comment.
CRITICAL: Same const-correctness violation as down_character.
This function has the identical const-correctness issue as down_character (lines 174-203). The parameter should be char *src instead of const char *src since the function modifies the input in-place.
🐛 Proposed fix
/*
* transform lower to upper
*/
char *
-upper_character(const char *src, int len, int encoding)
+upper_character(char *src, int len, int encoding)
{
char *s;
char *res;
int i;
Assert(src != NULL);
Assert(len >= 0);
- s = (char *) src;
+ s = src;
res = s;🤖 Prompt for AI Agents
In @src/oracle_fe_utils/ora_string_utils.c around lines 205 - 237, The function
upper_character modifies its input buffer in-place but its signature declares
src as const char*, causing a const-correctness violation; change the parameter
type from const char *src to char *src (matching the fix in down_character) and
update any callers if needed to pass a mutable buffer, ensuring
pg_encoding_mblen and in-place toupper usage remain valid.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)
111-112: Duplicate comment line.Lines 111 and 112 are identical—appears to be a copy-paste error.
📝 Suggested fix
/* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the infrastructure for. Instead we use toupper() to provide a - * locale-aware translation. However, there are some locales where this * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and
🧹 Nitpick comments (3)
src/backend/parser/scansup.c (3)
19-19: Redundant include.
postgres.halready provides boolean types via its internal headers. The<stdbool.h>include is unnecessary in PostgreSQL/IvorySQL code.♻️ Suggested fix
#include "postgres.h" #include <ctype.h> -#include <stdbool.h> #include "mb/pg_wchar.h"
225-249: Add input validation and fix formatting.This function lacks the
Assertchecks present indowncase_identifierandupcase_identifier. Also, there's a leading space beforeboolon line 225.♻️ Suggested fix
-/* - * Detemine whether the letters in the string are all lowercase letters - */ - bool - identifier_is_all_lower(const char *ident, int len) +/* + * Determine whether the letters in the string are all lowercase letters + */ +bool +identifier_is_all_lower(const char *ident, int len) { int i; const char* s; + + Assert(ident != NULL); + Assert(len >= 0); s = ident;
254-278: Add input validation and fix formatting.Same issues as
identifier_is_all_lower: missingAssertchecks and leading space beforebool.♻️ Suggested fix
-/* - * Detemine whether the letters in the string are all uppercase letters - */ - bool - identifier_is_all_upper(const char *ident, int len) +/* + * Determine whether the letters in the string are all uppercase letters + */ +bool +identifier_is_all_upper(const char *ident, int len) { int i; const char* s; + + Assert(ident != NULL); + Assert(len >= 0); s = ident;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/backend/parser/scansup.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: pg_regression (ubuntu-latest)
- GitHub Check: oracle_pg_regression (ubuntu-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: meson_build (ubuntu-latest)
- GitHub Check: oracle_regression (ubuntu-latest)
- GitHub Check: contrib_regression
🔇 Additional comments (4)
src/backend/parser/scansup.c (4)
53-54: Good defensive assertions.Adding precondition checks for NULL pointer and non-negative length is a sound practice for API robustness.
68-80: Multi-byte handling looks correct.Pre-copying with
memcpyand then skipping multi-byte sequences ensures non-ASCII characters are preserved while ASCII characters undergo case transformation. The index arithmetic correctly advances past the full multi-byte sequence.
102-103: Consistent assertions withdowncase_identifier.
118-130: Multi-byte handling consistent withdowncase_identifier.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/bin/initdb/initdb.c:
- Around line 3707-3715: The code currently assigns lowerusername = username
then calls down_character(username, ...) which operates in-place and returns the
same pointer, and then free(lowerusername) causes a use-after-free on username;
remove the free(lowerusername) call (or alternatively change down_character to
allocate a new buffer and document that behavior) so that username continues to
reference valid memory; update the block around down_character/username (and the
temporary lowerusername variable) to either omit the temporary and free or to
ensure ownership semantics are clear (prefer simply removing free(lowerusername)
since down_character is in-place).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/bin/initdb/initdb.c
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: meson_build (ubuntu-latest)
- GitHub Check: pg_regression (ubuntu-latest)
- GitHub Check: oracle_regression (ubuntu-latest)
- GitHub Check: oracle_pg_regression (ubuntu-latest)
- GitHub Check: contrib_regression
🔇 Additional comments (3)
src/bin/initdb/initdb.c (3)
85-85: LGTM!The new header provides the multi-byte encoding-aware string utilities (
down_character,is_all_upper) needed for the identifier case handling changes.
1703-1704: LGTM!The bootstrap command now correctly passes the identifier case switch mode to ensure consistent case handling during database initialization.
3698-3704: Correct initialization ordering for encoding-aware identifier handling.Moving these initialization calls before the Oracle username transformation ensures that
encodingidis properly set bysetup_locale_encoding()before it's used byis_all_upper()anddown_character(). This is essential for the multi-byte character support.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Line 234: Fix the spelling mistake in the comment for the function
identifier_is_all_lower by changing "Detemine" to "Determine"; make the
identical correction in the comment for identifier_is_all_upper as well so both
function comments read "Determine whether the letters in the string are all
lowercase/uppercase letters" respectively, leaving code and punctuation
unchanged.
🧹 Nitpick comments (4)
src/backend/parser/scansup.c (4)
19-19: Remove redundant include.The
postgres.hheader (line 16) already provides thebooltype and related definitions. Including<stdbool.h>is unnecessary and should be removed per PostgreSQL coding conventions.♻️ Proposed fix
-#include <stdbool.h>
236-260: Style inconsistencies with PostgreSQL conventions.Both new functions have minor style issues:
- Extra space before function name (lines 236, 265)
- Pointer declaration style should be
const char *snotconst char* s(lines 240, 269)♻️ Proposed fix for identifier_is_all_lower
- int i; - const char* s; + int i; + const char *s;Apply similar formatting fixes to
identifier_is_all_upper.Also applies to: 265-289
236-260: Add input validation assertions.Unlike
downcase_identifierandupcase_identifier(lines 53-54, 102-103), these new helper functions lack input validation. Consider adding assertions to guard against NULL pointers and negative lengths for consistency.♻️ Proposed fix
bool identifier_is_all_lower(const char *ident, int len) { int i; const char *s; + Assert(ident != NULL); + Assert(len >= 0); s = ident;Apply the same assertions to
identifier_is_all_upper.Also applies to: 265-289
236-260: Consider caching the encoding check for efficiency.Unlike
downcase_identifierandupcase_identifierwhich cachepg_database_encoding_max_length() == 1before the loop (lines 57, 106), these new functions callpg_mblenon every iteration regardless of encoding. For single-byte encodings, this adds unnecessary overhead.⚡ Proposed optimization
bool identifier_is_all_lower(const char *ident, int len) { int i; const char *s; + bool enc_is_single_byte; + + enc_is_single_byte = pg_database_encoding_max_length() == 1; s = ident; for (i = 0; i < len; i++) { - int mblen = pg_mblen(s); - - if (mblen > 1) + if (!enc_is_single_byte) { - s += mblen; - i += (mblen - 1); - continue; + int mblen = pg_mblen(s); + + if (mblen > 1) + { + s += mblen; + i += (mblen - 1); + continue; + } } if (isalpha(*s) && isupper(*s))Apply similar optimization to
identifier_is_all_upper.Also applies to: 265-289
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/sql/ora_dbms_output.sql:211-212
Timestamp: 2025-12-26T07:43:29.003Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c (lines 447-449), buffer_size values below 2000 are silently clamped to DBMS_OUTPUT_MIN_BUFFER_SIZE (2000), matching Oracle behavior. This is not an error condition.
Learnt from: rophy
Repo: IvorySQL/IvorySQL PR: 998
File: contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c:82-146
Timestamp: 2025-12-03T06:50:27.755Z
Learning: In contrib/ivorysql_ora/src/builtin_packages/dbms_output/dbms_output.c, the behavior where ENABLE() clears the existing buffer (rather than preserving it like Oracle does) is an intentional known difference tracked in issue #26 at https://github.com/rophy/IvorySQL/issues/26 (medium impact, open).
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/backend/parser/scansup.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: meson_build (ubuntu-latest)
- GitHub Check: contrib_regression
- GitHub Check: oracle_pg_regression (ubuntu-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: oracle_regression (ubuntu-latest)
- GitHub Check: pg_regression (ubuntu-latest)
🔇 Additional comments (3)
src/backend/parser/scansup.c (3)
53-54: LGTM: Good defensive programming.The input validation assertions appropriately guard against NULL pointers and negative lengths, improving code robustness.
Also applies to: 102-103
68-80: LGTM: Multi-byte character handling is correct.The approach of copying the input first with
memcpyand then selectively modifying only ASCII characters (while skipping multi-byte sequences viapg_mblen) correctly preserves multi-byte identifiers.Also applies to: 118-130
152-160: LGTM: Sensible optimization.The early-exit logic for identifiers containing '/' or '.' avoids unnecessary case transformations for path-like identifiers, improving efficiency.
| } | ||
|
|
||
| /* | ||
| * Detemine whether the letters in the string are all lowercase letters |
There was a problem hiding this comment.
Fix typo in comments.
"Detemine" should be "Determine" in both function comments.
📝 Proposed fix
/*
- * Detemine whether the letters in the string are all lowercase letters
+ * Determine whether the letters in the string are all lowercase letters
*/Apply the same fix to the comment at line 263 for identifier_is_all_upper.
Also applies to: 263-263
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c at line 234, Fix the spelling mistake in the
comment for the function identifier_is_all_lower by changing "Detemine" to
"Determine"; make the identical correction in the comment for
identifier_is_all_upper as well so both function comments read "Determine
whether the letters in the string are all lowercase/uppercase letters"
respectively, leaving code and punctuation unchanged.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/bin/initdb/initdb.c (1)
3706-3715: Document the slash check for special username formats.The encoding-aware username transformation looks correct. However, the new condition
!strchr(username, '/')at line 3710 lacks documentation. This check likely prevents transformation of special username formats (e.g., Kerberos principals likeuser/instance@REALM), but without a comment, its purpose is unclear and could lead to confusion.📝 Add explanatory comment
/* Oracle compatibility: transform uppercase usernames to lowercase. */ if (database_mode == DB_ORACLE && username != NULL && caseswitchmode != NORMAL && is_all_upper(username, strlen(username), encodingid) + /* Skip transformation for special username formats like Kerberos principals */ && !strchr(username, '/')) {Otherwise, the encoding-aware function calls with
encodingidare the correct fix for multi-byte character handling.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/parser/scansup.csrc/bin/initdb/initdb.c
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/bin/initdb/initdb.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: oracle_regression (ubuntu-latest)
- GitHub Check: oracle_pg_regression (ubuntu-latest)
- GitHub Check: meson_build (ubuntu-latest)
- GitHub Check: build (ubuntu-latest)
- GitHub Check: contrib_regression
- GitHub Check: pg_regression (ubuntu-latest)
🔇 Additional comments (3)
src/bin/initdb/initdb.c (3)
85-85: LGTM!The new include is necessary for the encoding-aware string utilities used in the Oracle username transformation.
1703-1703: LGTM!The formatting adjustment to the bootstrap command assembly is fine.
3698-3704: LGTM!Moving the initialization calls earlier in
main()ensuresencodingidis properly set before the Oracle username transformation logic. This is necessary for the encoding-aware string utilities to function correctly.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)
111-112: Duplicated comment line.Line 112 is a duplicate of line 111. Remove the duplicate.
Proposed fix
/* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the infrastructure for. Instead we use toupper() to provide a - * locale-aware translation. However, there are some locales where this * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 262-289: Fix the typo in the comment for identifier_is_all_upper
("Detemine" → "Determine") and add defensive assertions at the start of the
function: Assert(ident != NULL) and Assert(len >= 0); keep these assertions
consistent with identifier_is_all_lower so the function validates its inputs
before processing.
- Around line 233-260: Fix the typo in the function comment ("Detemine" →
"Determine") and add input validation at the start of identifier_is_all_lower by
inserting Assert(ident != NULL) and Assert(len >= 0) (matching the style used in
downcase_identifier/upcase_identifier); keep the rest of the function logic
unchanged and reference identifier_is_all_lower to locate the edit.
- Around line 150-160: Remove the unused local variable `i` and the
commented-out loop block in scansup.c (the /* for (i = 0; ...) */ section) to
eliminate dead code and compiler warnings; if that logic must be preserved for
future work instead, replace the block with a short TODO comment describing the
intended behavior and remove the unused `i` declaration so no unused-variable
remains.
🧹 Nitpick comments (1)
src/backend/parser/scansup.c (1)
236-240: Minor formatting inconsistencies.
- Lines 236, 265: Extra leading space before
boolreturn type- Lines 240, 269: Pointer declaration style
const char* sshould beconst char *sper PostgreSQL coding conventionsProposed fix for identifier_is_all_lower
- bool - identifier_is_all_lower(const char *ident, int len) +bool +identifier_is_all_lower(const char *ident, int len) { int i; - const char* s; + const char *s;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/backend/parser/scansup.c
🔇 Additional comments (2)
src/backend/parser/scansup.c (2)
53-54: Good addition of defensive assertions.These assertions protect against null pointer dereferences and negative length values. This is a welcome improvement.
68-87: Multi-byte handling logic is correct.The approach of copying the entire input first via
memcpy, then selectively modifying only single-byte ASCII characters while skipping multi-byte sequences, correctly preserves non-ASCII characters.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)
111-112: Duplicate comment line.Line 112 is an exact duplicate of line 111. Remove the duplicate.
Suggested fix
/* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the infrastructure for. Instead we use toupper() to provide a - * locale-aware translation. However, there are some locales where this * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 265-289: In identifier_is_all_upper, cast characters to unsigned
char before calling isalpha() and islower() (e.g., use (unsigned char)*s) to
avoid undefined behavior; add assertions (Assert(ident != NULL && len >= 0)) at
function start; fix the typo in any comment or message from "Detemine" to
"Determine"; normalize formatting to match project style (no leading space
before function return type, use consistent pointer style like const char
*ident) and keep multibyte-handling logic as-is while ensuring spacing/brace
style matches identifier_is_all_lower.
- Around line 236-260: The function identifier_is_all_lower has undefined
behavior because isalpha() and isupper() are called with plain char; cast the
character to (unsigned char) before passing to isalpha/isupper (as done in
downcase_identifier), add Assert(ident != NULL); Assert(len >= 0); at the start
for consistency, change the signature spacing to "bool
identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL pointer
style "const char *s", and fix the comment typo "Detemine" → "Determine"; keep
the existing multi-byte handling via pg_mblen() and advance s and i as currently
implemented.
🧹 Nitpick comments (2)
src/backend/parser/scansup.c (2)
19-19: Redundant include.
postgres.halready provides thebooltype viac.h. This include is unnecessary in backend code.
150-160: Remove or implement commented-out code.This commented block introduces noise. If the
'/'separator logic is needed, uncomment and test it; otherwise, remove it to keep the codebase clean.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/backend/parser/scansup.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/backend/parser/scansup.c
🔇 Additional comments (2)
src/backend/parser/scansup.c (2)
53-54: LGTM!Good defensive assertions to validate preconditions.
68-80: LGTM - Multi-byte handling is correct.The approach of copying first via
memcpy, then selectively transforming single-byte ASCII characters while skipping multi-byte sequences, is sound.Minor style nit: PostgreSQL convention places a tab between type and variable name (
int<tab>mblen).
| bool | ||
| identifier_is_all_lower(const char *ident, int len) | ||
| { | ||
| int i; | ||
| const char* s; | ||
|
|
||
| s = ident; | ||
|
|
||
| for (i = 0; i < len; i++) | ||
| { | ||
| int mblen = pg_mblen(s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| if (isalpha(*s) && isupper(*s)) | ||
| return false; | ||
| s++; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Undefined behavior: pass unsigned char to ctype functions.
isalpha() and isupper() have undefined behavior when passed a negative value (which occurs for high-bit-set chars when char is signed). Cast to unsigned char as done in downcase_identifier.
Also:
- Line 234: Typo "Detemine" → "Determine"
- Line 236: Remove leading space before
bool - Line 240: Use PostgreSQL pointer style
const char *s - Add
Assert(ident != NULL); Assert(len >= 0);for consistency
Suggested fix
/*
- * Detemine whether the letters in the string are all lowercase letters
+ * Determine whether the letters in the string are all lowercase letters
*/
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
{
- int i;
- const char* s;
-
- s = ident;
+ int i;
+ const char *s = ident;
+
+ Assert(ident != NULL);
+ Assert(len >= 0);
for (i = 0; i < len; i++)
{
+ unsigned char ch;
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
- if (isalpha(*s) && isupper(*s))
+ ch = (unsigned char) *s;
+ if (isalpha(ch) && isupper(ch))
return false;
s++;
}
return true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bool | |
| identifier_is_all_lower(const char *ident, int len) | |
| { | |
| int i; | |
| const char* s; | |
| s = ident; | |
| for (i = 0; i < len; i++) | |
| { | |
| int mblen = pg_mblen(s); | |
| if (mblen > 1) | |
| { | |
| s += mblen; | |
| i += (mblen - 1); | |
| continue; | |
| } | |
| if (isalpha(*s) && isupper(*s)) | |
| return false; | |
| s++; | |
| } | |
| return true; | |
| } | |
| /* | |
| * Determine whether the letters in the string are all lowercase letters | |
| */ | |
| bool | |
| identifier_is_all_lower(const char *ident, int len) | |
| { | |
| int i; | |
| const char *s = ident; | |
| Assert(ident != NULL); | |
| Assert(len >= 0); | |
| for (i = 0; i < len; i++) | |
| { | |
| unsigned char ch; | |
| int mblen = pg_mblen(s); | |
| if (mblen > 1) | |
| { | |
| s += mblen; | |
| i += (mblen - 1); | |
| continue; | |
| } | |
| ch = (unsigned char) *s; | |
| if (isalpha(ch) && isupper(ch)) | |
| return false; | |
| s++; | |
| } | |
| return true; | |
| } |
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 236 - 260, The function
identifier_is_all_lower has undefined behavior because isalpha() and isupper()
are called with plain char; cast the character to (unsigned char) before passing
to isalpha/isupper (as done in downcase_identifier), add Assert(ident != NULL);
Assert(len >= 0); at the start for consistency, change the signature spacing to
"bool identifier_is_all_lower(const char *ident, int len)" and use PostgreSQL
pointer style "const char *s", and fix the comment typo "Detemine" →
"Determine"; keep the existing multi-byte handling via pg_mblen() and advance s
and i as currently implemented.
| bool | ||
| identifier_is_all_upper(const char *ident, int len) | ||
| { | ||
| int i; | ||
| const char* s; | ||
|
|
||
| s = ident; | ||
|
|
||
| for (i = 0; i < len; i++) | ||
| { | ||
| int mblen = pg_mblen(s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| if (isalpha(*s) && islower(*s)) | ||
| return false; | ||
| s++; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Same issues as identifier_is_all_lower.
Apply identical fixes:
- Cast to
unsigned charbefore callingisalpha()/islower() - Fix typo "Detemine" → "Determine"
- Fix formatting (leading space, pointer style)
- Add assertions
Suggested fix
/*
- * Detemine whether the letters in the string are all uppercase letters
+ * Determine whether the letters in the string are all uppercase letters
*/
- bool
- identifier_is_all_upper(const char *ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
{
- int i;
- const char* s;
-
- s = ident;
+ int i;
+ const char *s = ident;
+
+ Assert(ident != NULL);
+ Assert(len >= 0);
for (i = 0; i < len; i++)
{
+ unsigned char ch;
int mblen = pg_mblen(s);
if (mblen > 1)
{
s += mblen;
i += (mblen - 1);
continue;
}
- if (isalpha(*s) && islower(*s))
+ ch = (unsigned char) *s;
+ if (isalpha(ch) && islower(ch))
return false;
s++;
}
return true;
}🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 265 - 289, In
identifier_is_all_upper, cast characters to unsigned char before calling
isalpha() and islower() (e.g., use (unsigned char)*s) to avoid undefined
behavior; add assertions (Assert(ident != NULL && len >= 0)) at function start;
fix the typo in any comment or message from "Detemine" to "Determine"; normalize
formatting to match project style (no leading space before function return type,
use consistent pointer style like const char *ident) and keep multibyte-handling
logic as-is while ensuring spacing/brace style matches identifier_is_all_lower.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/parser/scansup.c (1)
111-116: Duplicate comment line and incorrect terminology.
- Lines 111-112 contain a duplicated sentence: "locale-aware translation. However, there are some locales where this"
- Line 116 says "ASCII-only downcasing" but this is the
upcase_identifierfunction - it should say "upcasing".Suggested fix
/* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the infrastructure for. Instead we use toupper() to provide a - * locale-aware translation. However, there are some locales where this * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and * 'I'). Our current compromise is to use toupper() for characters with * the high bit set, as long as they aren't part of a multi-byte - * character, and use an ASCII-only downcasing for 7-bit characters. + * character, and use an ASCII-only upcasing for 7-bit characters. */
🤖 Fix all issues with AI agents
In @src/backend/parser/scansup.c:
- Around line 222-249: Fix the typo in the function comment ("Detemine" →
"Determine") and remove the stray leading space before the function return type;
then add the same defensive assertions used in downcase_identifier and
upcase_identifier to validate inputs (e.g., Assert(ident != NULL) and Assert(len
>= 0)) at the start of identifier_is_all_lower so it rejects NULL ident and
negative len values consistently with those other functions.
- Around line 251-278: Fix the typo in the function comment ("Detemine" ->
"Determine") and remove the stray leading space before the return type so the
function declaration reads "bool identifier_is_all_upper(...)" consistently;
then add the same input validation used in identifier_is_all_lower by asserting
the pointer and length at the top of identifier_is_all_upper (e.g., Assert(ident
!= NULL); Assert(len >= 0);) before using ident and len. Maintain the existing
multibyte handling and loop logic otherwise.
🧹 Nitpick comments (1)
src/backend/parser/scansup.c (1)
19-19: Unnecessary include:<stdbool.h>is already provided bypostgres.h.The
postgres.hheader (included at line 16) already providesbool,true, andfalsedefinitions. This include is redundant and can be removed.Suggested fix
#include "postgres.h" #include <ctype.h> -#include <stdbool.h> #include "mb/pg_wchar.h"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/backend/parser/scansup.csrc/bin/initdb/initdb.c
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
📚 Learning: 2025-10-24T08:19:29.453Z
Learnt from: OreoYang
Repo: IvorySQL/IvorySQL PR: 931
File: src/backend/oracle_parser/ora_gram.y:0-0
Timestamp: 2025-10-24T08:19:29.453Z
Learning: In IvorySQL’s parser (src/backend/oracle_parser/ora_gram.y and related headers), downcase_identifier has the 4-argument signature:
extern char *downcase_identifier(const char *ident, int len, bool warn, bool truncate);
Applied to files:
src/bin/initdb/initdb.csrc/backend/parser/scansup.c
🔇 Additional comments (4)
src/backend/parser/scansup.c (2)
68-87: LGTM - Multi-byte handling in downcase_identifier looks correct.The approach of copying the entire input first with
memcpy, then selectively transforming only single-byte ASCII characters while skipping multi-byte sequences usingpg_mblen, is a sound strategy for preserving non-ASCII bytes.
118-137: LGTM - Multi-byte handling in upcase_identifier mirrors downcase_identifier correctly.The implementation correctly follows the same pattern as
downcase_identifierfor handling multi-byte characters.src/bin/initdb/initdb.c (2)
3698-3711: LGTM - Initialization reordering enables encoding-aware username transformation.The reordering of
set_info_version(),setup_data_file_paths(),setup_locale_encoding(), andsetup_text_search()before the Oracle username handling is necessary becausesetup_locale_encoding()sets theencodingidvariable, which is now required byis_all_upper()anddown_character()for proper multi-byte character handling.
85-85: Verify theora_string_utils.hheader provides the required functions.The code uses
is_all_upper()anddown_character()functions from this header. Ensure these functions exist with the expected signatures.
| /* | ||
| * Detemine whether the letters in the string are all lowercase letters | ||
| */ | ||
| bool | ||
| identifier_is_all_lower(const char *ident, int len) | ||
| { | ||
| int i; | ||
| const char* s; | ||
|
|
||
| s = ident; | ||
|
|
||
| for (i = 0; i < len; i++) | ||
| { | ||
| int mblen = pg_mblen(s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| if (isalpha(*s) && isupper(*s)) | ||
| return false; | ||
| s++; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Typo in comment and missing input validation.
- Line 223: "Detemine" should be "Determine".
- Lines 225-226: Formatting issue - there's a leading space before
boolon the function definition. - Unlike
downcase_identifierandupcase_identifier(which have Assert checks at lines 53-54 and 102-103), this function lacks validation for NULLidentor negativelen. For consistency and safety, consider adding similar assertions.
Suggested fix
/*
- * Detemine whether the letters in the string are all lowercase letters
+ * Determine whether the letters in the string are all lowercase letters
*/
- bool
- identifier_is_all_lower(const char *ident, int len)
+bool
+identifier_is_all_lower(const char *ident, int len)
{
int i;
const char* s;
+ Assert(ident != NULL);
+ Assert(len >= 0);
s = ident;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* | |
| * Detemine whether the letters in the string are all lowercase letters | |
| */ | |
| bool | |
| identifier_is_all_lower(const char *ident, int len) | |
| { | |
| int i; | |
| const char* s; | |
| s = ident; | |
| for (i = 0; i < len; i++) | |
| { | |
| int mblen = pg_mblen(s); | |
| if (mblen > 1) | |
| { | |
| s += mblen; | |
| i += (mblen - 1); | |
| continue; | |
| } | |
| if (isalpha(*s) && isupper(*s)) | |
| return false; | |
| s++; | |
| } | |
| return true; | |
| } | |
| /* | |
| * Determine whether the letters in the string are all lowercase letters | |
| */ | |
| bool | |
| identifier_is_all_lower(const char *ident, int len) | |
| { | |
| int i; | |
| const char* s; | |
| Assert(ident != NULL); | |
| Assert(len >= 0); | |
| s = ident; | |
| for (i = 0; i < len; i++) | |
| { | |
| int mblen = pg_mblen(s); | |
| if (mblen > 1) | |
| { | |
| s += mblen; | |
| i += (mblen - 1); | |
| continue; | |
| } | |
| if (isalpha(*s) && isupper(*s)) | |
| return false; | |
| s++; | |
| } | |
| return true; | |
| } |
🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 222 - 249, Fix the typo in the
function comment ("Detemine" → "Determine") and remove the stray leading space
before the function return type; then add the same defensive assertions used in
downcase_identifier and upcase_identifier to validate inputs (e.g., Assert(ident
!= NULL) and Assert(len >= 0)) at the start of identifier_is_all_lower so it
rejects NULL ident and negative len values consistently with those other
functions.
| /* | ||
| * Detemine whether the letters in the string are all uppercase letters | ||
| */ | ||
| bool | ||
| identifier_is_all_upper(const char *ident, int len) | ||
| { | ||
| int i; | ||
| const char* s; | ||
|
|
||
| s = ident; | ||
|
|
||
| for (i = 0; i < len; i++) | ||
| { | ||
| int mblen = pg_mblen(s); | ||
|
|
||
| if (mblen > 1) | ||
| { | ||
| s += mblen; | ||
| i += (mblen - 1); | ||
| continue; | ||
| } | ||
|
|
||
| if (isalpha(*s) && islower(*s)) | ||
| return false; | ||
| s++; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Typo in comment and missing input validation (same issues as identifier_is_all_lower).
- Line 252: "Detemine" should be "Determine".
- Lines 254-255: Formatting issue with leading space before
bool. - Missing Assert checks for NULL pointer and non-negative length.
Suggested fix
/*
- * Detemine whether the letters in the string are all uppercase letters
+ * Determine whether the letters in the string are all uppercase letters
*/
- bool
- identifier_is_all_upper(const char * ident, int len)
+bool
+identifier_is_all_upper(const char *ident, int len)
{
int i;
const char* s;
+ Assert(ident != NULL);
+ Assert(len >= 0);
s = ident;🤖 Prompt for AI Agents
In @src/backend/parser/scansup.c around lines 251 - 278, Fix the typo in the
function comment ("Detemine" -> "Determine") and remove the stray leading space
before the return type so the function declaration reads "bool
identifier_is_all_upper(...)" consistently; then add the same input validation
used in identifier_is_all_lower by asserting the pointer and length at the top
of identifier_is_all_upper (e.g., Assert(ident != NULL); Assert(len >= 0);)
before using ident and len. Maintain the existing multibyte handling and loop
logic otherwise.
Summary by CodeRabbit
Bug Fixes
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.