Skip to content

Return dataptr() again to avoid getting an invalid address for empty vectors#1462

Open
Enchufa2 wants to merge 4 commits intomasterfrom
fix/1461
Open

Return dataptr() again to avoid getting an invalid address for empty vectors#1462
Enchufa2 wants to merge 4 commits intomasterfrom
fix/1461

Conversation

@Enchufa2
Copy link
Copy Markdown
Member

@Enchufa2 Enchufa2 commented Mar 26, 2026

Closes #1461. This happens because this returns 0x1 for empty vectors. The question is whether we should do the same check in here. I think we should.

Checklist

  • Code compiles correctly
  • R CMD check still passes all tests
  • Preferably, new tests were added which fail without the change
  • Document the changes by file in ChangeLog

@Enchufa2 Enchufa2 marked this pull request as ready for review March 26, 2026 16:20
@eddelbuettel
Copy link
Copy Markdown
Member

That looks great. I could turn on the big bad rev.dep machine but I am not even sure it's needed.

Copy link
Copy Markdown
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good -- test on entry is the way to go. There might as always be other code paths but this is at a minimum a start if not even a solid 'squash' of that bug.

@kevinushey
Copy link
Copy Markdown
Contributor

Are we sure std::copy() is going to accept a null pointer here? If we don't already have it, a unit test would be nice -- presumably ASAN / UBSAN builds of Rcpp with those tests would catch this issue.

I'm also fairly certain the actual issue here comes from this bit of cruel code.

https://github.com/wch/r-source/blob/6b5f72eebd9211766bd159e017a167385c93772e/src/include/Rinlinedfuns.h#L135-L142

@kevinushey
Copy link
Copy Markdown
Contributor

kevinushey commented Mar 26, 2026

Perhaps we can return dataptr(x) instead of NULL? With the appropriate casting.

Or, alternatively -- instead of using __ACCESSOR__ here, just use DATAPTR_RO, and cast to the appropriate type. But we're sort of cheating since DATAPTR_RO is supposed to only give read access to memory, and these might be writable...

I just hate the idea of this zero-length check becoming necessary. :|

@Enchufa2
Copy link
Copy Markdown
Member Author

Enchufa2 commented Mar 27, 2026

dataptr has the same problem, and I think we should add the same workaround, because all these accessors end up calling that macro I linked above, which doesn't check the length and therefore return an invalid pointer when the vector is empty.

std::copy works fine with a null pointer, I tested it with the reprex provided by the update, but of course I can add a test.

Note also that the vector cache sets start to zero upon initialization, but this is messed up when the update method is called. So this follows the initial idea of having a null pointer as the start of there is no start.

@kevinushey
Copy link
Copy Markdown
Contributor

kevinushey commented Mar 27, 2026

dataptr has the same problem, and I think we should add the same workaround, because all these accessors end up calling that macro I linked above, which doesn't check the length and therefore return an invalid pointer when the vector is empty.

Are you sure? dataptr goes through R's DATAPTR_RO, which shouldn't have the same 0x1 path for empty vectors. Please correct me if I'm wrong.

Rcpp/src/barrier.cpp

Lines 77 to 81 in 7ff6ba2

// [[Rcpp::register]]
void* dataptr(SEXP x) {
// DATAPTR_RO was introduced with R 3.5.0
return const_cast<void*>(DATAPTR_RO(x));
}

https://github.com/wch/r-source/blob/6b5f72eebd9211766bd159e017a167385c93772e/src/include/Rinlinedfuns.h#L148-L154

https://github.com/wch/r-source/blob/6b5f72eebd9211766bd159e017a167385c93772e/src/include/Defn.h#L421

@Enchufa2
Copy link
Copy Markdown
Member Author

The last one is STDVEC_DATAPTR, which is what I linked above, and has this +1 unconditionally. Or am I missing something?

@Enchufa2
Copy link
Copy Markdown
Member Author

Enchufa2 commented Mar 27, 2026

Oh, I get what you mean, the other macro will return a valid address +1. Is that a valid memory address anyway? In the sense, it may be unallocated.

@Enchufa2
Copy link
Copy Markdown
Member Author

In fact, DATAPTR was avoided in #1310 in response to #1308. :-\

@kevinushey
Copy link
Copy Markdown
Contributor

kevinushey commented Mar 27, 2026

Oh, I get what you mean, the other macro will return a valid address +1. Is that a valid memory address anyway? In the sense, it may be unallocated.

Good question. I think the C++ standard guarantees that pointers to the "end" of an allocation are still valid, as long as you don't de-reference those pointers. I don't know if that extends to memory allocated from C, though...

Honestly though, the right solution might just be for us to request R Core back out this change upstream.

@Enchufa2 Enchufa2 changed the title Return null start for empty vectors Return dataptr() again to avoid getting an invalid address for empty vectors Mar 27, 2026
@Enchufa2
Copy link
Copy Markdown
Member Author

Ok, let's go back to dataptr() then without workarounds. The original test case #1461 works without UB. And I've run the entire test suite and there are no issues. (In fact, I've found another UB, but unrelated; another fight for another day).

@kevinushey
Copy link
Copy Markdown
Contributor

Thanks @Enchufa2; sorry for being so nit-picky here!

@Enchufa2
Copy link
Copy Markdown
Member Author

That's fine, that's the purpose of a review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UBSAN issue can arise when std::copy(ing) an Rcpp::NumericVector of length 0

3 participants