Support different TimeUnits and timezones when reading Timestamps from INT96#7285
Support different TimeUnits and timezones when reading Timestamps from INT96#7285alamb merged 9 commits intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thanks @mbutrovich and @a10y -- this is looking pretty good to me
I think we need to fix the API to be a non breaking change if we want to get this into the next non-breaking release (in the next few days)
Otherwise all i think this PR needs is Tests for timezones and it should be good to go
| #[inline] | ||
| pub fn convert_int96(_descr: &ColumnDescPtr, value: Int96) -> Self { | ||
| Field::TimestampMillis(value.to_i64()) | ||
| Field::TimestampMillis(value.to_millis()) |
There was a problem hiding this comment.
I think this should be to_nanos() to preserve the old behavior
But then again it doesn't make sense to erturn a nanosecond timestamp for a value with millisecond precision 🤔
There was a problem hiding this comment.
I think this should be to_nanos() to preserve the old behavior
The old behavior is actually to convert it to millis.
Current behavior for convert_i96 has it call to_i64 which converts to millis, so I tried to keep the behavior the same.
|
BTW the msrv test failure is not related: |
alamb
left a comment
There was a problem hiding this comment.
Thank you @mbutrovich -- this looks good to me.
Seems like there are some CI failures to address but then it should be good to merge
|
Thanks again @mbutrovich |
Which issue does this PR close?
Brings behavior similar to arrow-cpp: https://github.com/apache/arrow/blob/6b66c842eefec520d391203d205cd91d1ca0dd65/cpp/src/parquet/arrow/schema_internal.cc#L206 except this approach relies on Parquet metadata, or passing
Schemainto Parquet reader rather than a specific flag.Rationale for this change
See issue.
What changes are included in this PR?
IntoBufferforVec<T>whereTis a Parquet type now takes anArrowTypearg so we know what to materialize INT96 into.ArrowTypemetadata in Parquet can now specify the resolution. Alternatively, asupplied_schematoArrowReaderOptionscan achieve the same effect, which is how DataFusion will pass schema hints for INT96 (similar toStringViewoptimizations).ArrowTypehints.Are there any user-facing changes?
#7250 (comment)
I don't believe so.