-
Notifications
You must be signed in to change notification settings - Fork 697
DiffLine.rawContent() returns string instead of Buffer, causing non-UTF-8 encoding corruption #2038
Copy link
Copy link
Open
Description
Description
Thanks to the nodegit maintainers for this excellent library!
This issue was debugged with the assistance of Cursor and Opus 4.5.
Current Behavior
DiffLine.rawContent() returns a JavaScript string type, but the underlying libgit2 git_diff_line.content is a raw byte pointer (const char *) that is not NUL-terminated and may contain non-UTF-8 encoded content (e.g., GBK, GB18030).
The current implementation in lib/diff_line.js:
var _rawContent = DiffLine.prototype.content; // Save original native method
DiffLine.prototype.content = function() {
// ...
this._cache.content = Buffer.from(this.rawContent())
.slice(0, this.contentLen())
.toString("utf8");
return this._cache.content;
};
DiffLine.prototype.rawContent = function() {
return _rawContent.call(this); // Calls native binding
};The problem is that _rawContent (the native binding) already converts const char * to a JavaScript string, presumably using v8::String::NewFromUtf8() or similar, which assumes UTF-8 encoding.
Expected Behavior
rawContent() should return a Buffer containing the original bytes, allowing users to detect and decode the encoding themselves:
DiffLine.prototype.rawContent = function() {
// Return Buffer instead of string
return _rawContent.call(this); // Should return Buffer
};
DiffLine.prototype.content = function() {
// ... existing implementation
return this.rawContent()
.slice(0, this.contentLen())
.toString("utf8");
};Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels