fix: Update the pyarrow to latest v14.0.1 regarding the CVE-2023-47248. by shuchu · Pull Request #3835 · feast-dev/feast

shuchu · 2023-11-14T04:04:32Z

What this PR does / why we need it:
Update the pyarrow to latest version v14.0.1 which has the fix for CVE-2023-47248

Fixes #3832

Signed-off-by: Shuchu Han <[email protected]>

shuchu · 2023-11-14T04:16:34Z

A little bit worried about the unit test coverage.

please be aware that I unpin the pyarrow version.

py3.8-requirements.txt and py3.8-ci-requirements.txt were updated manually. (regarding the DASK version issue for python 3.8)

achals · 2023-11-14T18:20:31Z

sdk/python/requirements/py3.10-ci-requirements.txt

@@ -1,5 +1,5 @@
 #
-# This file is autogenerated by pip-compile with Python 3.10
+# This file is autogenerated by pip-compile with Python 3.9


I think this is incorrect

you are right, I need to create a python 3.10 venv and run the command from Makefile. let me fix this.

fixed, let's see the testing results.

seems integration test failed...

21 fails, the most frequent error is about the wrong format of Timestamp: google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Invalid timestamp microseconds value 1700011424237000000 of logical type NONE; in column 'created'

let me dig into it and see what's the root cause

1, The Google's BigQuery api only accepts "ms" resolution for timestamp, while the Pyarrow.parquet.write_table() will maintain the resolution to the exact original resolution which is "ns" by default.
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html

Signed-off-by: Shuchu Han <[email protected]>

…le write to temporary parquet file. Signed-off-by: Shuchu Han <[email protected]>

…ng pyarrow v10.0.1 Signed-off-by: Shuchu Han <[email protected]>

shuchu · 2023-11-16T22:04:31Z

I meet a very interesting problem. I only update the Pyarrow version and snowflake api, the integration test results show me that the timestamp range is error while running Redshift SQL query.

{ error:  Timestamp out of range.\n  
              code:      8001\n  }

it happens while run "get_historical_features()" and the timestamp range were inferenced from the "entity_df":
as in redshift.py::_get_entity_df_event_timestamp_range().
f"SELECT MIN({entity_df_event_timestamp_col}) AS min, MAX({entity_df_event_timestamp_col}) AS max "

Signed-off-by: Shuchu Han <[email protected]>

shuchu · 2023-11-17T04:29:46Z

please do not merge this PR. @sudohainguyen
It's in a mess status and is for debugging only now.

Signed-off-by: Shuchu Han <[email protected]>

sudohainguyen · 2023-11-17T05:15:09Z

No worries, looking forward to seeing this works

Signed-off-by: Shuchu Han <[email protected]>

shuchu · 2023-11-17T16:43:28Z

Finally, I found the fix way. It's about the setting of parameter "coerce_timestamps" of "pyarrow.parquet.write_table".

Let me close this PR and create a clean new one.

sudohainguyen · 2023-11-17T16:44:45Z

Great @shuchu !!

fix: Update the pyarrow to latest v14.0.1 regarding the CVE-2023-47248.

9d671f1

Signed-off-by: Shuchu Han <[email protected]>

achals added lgtm approved ok-to-test labels Nov 14, 2023

achals reviewed Nov 14, 2023

View reviewed changes

achals mentioned this pull request Nov 14, 2023

fix: Update typeguard version to >=4.0.0 #3837

Merged

shuchu added 2 commits November 14, 2023 20:14

fix: Update the requirements file of python3.10.

9b4ec66

Signed-off-by: Shuchu Han <[email protected]>

fix: force the timestamp's datatype resolution to ms for bigquery whi…

0e0a5ad

…le write to temporary parquet file. Signed-off-by: Shuchu Han <[email protected]>

shuchu requested a review from sudohainguyen as a code owner November 16, 2023 05:21

fix: Update the timestamp resolution to us which is exact same as usi…

fe61d0d

…ng pyarrow v10.0.1 Signed-off-by: Shuchu Han <[email protected]>

shuchu added 4 commits November 16, 2023 21:09

fix: Add debug info to check the entity_df before send to redshift.

e357d79

Signed-off-by: Shuchu Han <[email protected]>

fix: add debug to check the loaded pyarrow table file.

20e55c4

Signed-off-by: Shuchu Han <[email protected]>

fix: debug the transformation from pyarrow table to pandas df.

fbb5b20

Signed-off-by: Shuchu Han <[email protected]>

fix: roll pyarrow back.

059914b

Signed-off-by: Shuchu Han <[email protected]>

fix: force test fail to check table and pandas df.

f032d7a

Signed-off-by: Shuchu Han <[email protected]>

shuchu added 5 commits November 17, 2023 01:33

fix: debug the transformation from parquet file to pyarrow table.

a2e217f

Signed-off-by: Shuchu Han <[email protected]>

fix: debug the parquet file.

0e43126

Signed-off-by: Shuchu Han <[email protected]>

fix: test 14.0.1 parquet file.

19d30ef

Signed-off-by: Shuchu Han <[email protected]>

fix: test the parquet reader of pandas instead of pyarrow.

ffe6255

Signed-off-by: Shuchu Han <[email protected]>

fix: use datetime resolution us while writing test data into redshift.

dc2630e

Signed-off-by: Shuchu Han <[email protected]>

shuchu closed this Nov 17, 2023

shuchu mentioned this pull request Nov 17, 2023

fix: Upgrade the pyarrow to latest v14.0.1 for CVE-2023-47248. #3841

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update the pyarrow to latest v14.0.1 regarding the CVE-2023-47248.#3835

fix: Update the pyarrow to latest v14.0.1 regarding the CVE-2023-47248.#3835
shuchu wants to merge 14 commits intofeast-dev:masterfrom
shuchu:issue-3832

shuchu commented Nov 14, 2023

Uh oh!

shuchu commented Nov 14, 2023 •

edited

Loading

Uh oh!

achals Nov 14, 2023

Uh oh!

shuchu Nov 15, 2023

Uh oh!

shuchu Nov 15, 2023

Uh oh!

shuchu Nov 15, 2023

Uh oh!

shuchu Nov 15, 2023

Uh oh!

shuchu Nov 16, 2023 •

edited

Loading

Uh oh!

shuchu commented Nov 16, 2023

Uh oh!

shuchu commented Nov 17, 2023

Uh oh!

sudohainguyen commented Nov 17, 2023

Uh oh!

shuchu commented Nov 17, 2023

Uh oh!

sudohainguyen commented Nov 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shuchu commented Nov 14, 2023

Uh oh!

shuchu commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achals Nov 14, 2023

Choose a reason for hiding this comment

Uh oh!

shuchu Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

shuchu Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

shuchu Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

shuchu Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

shuchu Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuchu commented Nov 16, 2023

Uh oh!

shuchu commented Nov 17, 2023

Uh oh!

sudohainguyen commented Nov 17, 2023

Uh oh!

shuchu commented Nov 17, 2023

Uh oh!

sudohainguyen commented Nov 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shuchu commented Nov 14, 2023 •

edited

Loading

shuchu Nov 16, 2023 •

edited

Loading