Feature: implement snowflake export function #156#163
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## develop #163 +/- ##
===========================================
+ Coverage 92.91% 92.94% +0.02%
===========================================
Files 291 291
Lines 4473 4506 +33
Branches 604 606 +2
===========================================
+ Hits 4156 4188 +32
- Misses 203 204 +1
Partials 114 114
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
kokokuo
left a comment
There was a problem hiding this comment.
Besides some suggestions, others LGTM 👍
| this.logger.debug(`Acquiring connection from ${profileName}`); | ||
| const client = await pool.acquire(); | ||
| this.logger.debug(`Acquired connection from ${profileName}`); | ||
| const { connection, pool } = await this.getConnection(profileName); |
|
|
||
| // use protected for testing purpose | ||
| protected getCopyToStageSQL(sql: string, stageFilePath: string) { | ||
| return `COPY INTO ${stageFilePath} FROM (${sql}) FILE_FORMAT = (TYPE = 'parquet') HEADER=true INCLUDE_QUERY_ID=true MAX_FILE_SIZE=5368709120;`; |
There was a problem hiding this comment.
Suggest to add the comment to explain why to set the MAX_FILE_SIZE is 5368709120 not other values, and paste the document link to help other members know :D
| const { profileName, sql } = exportOptions; | ||
|
|
||
| const { connection, pool } = await this.getConnection(profileName); | ||
| directory = /^\./.exec(directory) ? directory : `./${directory}`; |
There was a problem hiding this comment.
Currently, our directory may be a relative path based on the project or an absolute path temp directory from the root if the user does not set folderPath in the cache settings of the vulcan.yaml:
Relative path:
> `tmp/schem1/pg/cache_department`
Absolute path:
> `/var/folders/51/049hk_157kgc8ht18q61c1lm0000gn/T/vulcan/cache/schem1/pg/cache_department`If the snowflake GET path needs the absolute path, then currently the checking directory has . before the path may cause an issue, and need to remove it.
We could use path.resolve to normalize the above two cases directly, before is the screenshot (seems you have written it in the following code):
Btw, suggest adding the comment to explain the reason we need to get the absolute path from the Snowflake GET method needed and paste the document link to help other members know it.
| const copyStatement = await this.getStatementFromSQL( | ||
| connection, | ||
| builtSQL, | ||
| [] |
There was a problem hiding this comment.
Maybe we could refactor getStatementFromSQL methods to make the binding arguments could be [] array default, then we don't need always to pass [] when copying to stage, getting files, and removing stage.
| public override async export(exportOptions: ExportOptions): Promise<void> { | ||
| let { directory } = exportOptions; | ||
| const { profileName, sql } = exportOptions; | ||
|
|
There was a problem hiding this comment.
Suggest removing the newline.
| // Assert | ||
| const files = fs.readdirSync(directory); | ||
| expect(files.length).toBe(1); | ||
| expect(/parquet$/.exec(path.basename(files[0]))); |
There was a problem hiding this comment.
Well, the expect(/parquet$/.exec(path.basename(files[0]))) actually may not really assert the result.
The value in the expect(value) is the actual value, we should call the method to do the assert.
Please use the expect method to assert the result, e.g:
expect(files[0]).toMatch('/.parquet$/');| const files = fs.readdirSync(directory); | ||
| expect(files.length).toBe(4); | ||
| // check each file in the files exists | ||
| files.forEach((file) => { |
There was a problem hiding this comment.
The same issue and suggestion line above.
There was a problem hiding this comment.
Suggest to enhance test cases, according to the export method logistics, we may still need two test cases, first is checking the directory not exist and throw error, second is the checking the happened cached error case
| const stage = '@~/vulcan_cache_stage'; | ||
| const stageFilePath = `${stage}/`; | ||
| const builtSQL = this.getCopyToStageSQL(sql, stageFilePath); | ||
| // copy data to a named stage |
There was a problem hiding this comment.
Suggest to rich the comment to explain why we need to copy to the stage to help members know moving to the stage is necessary.
| await this.waitStatement(getStatement); | ||
| this.logger.debug(`Exported parquet files to ${directory}`); | ||
|
|
||
| // remove parquet file from stage |
There was a problem hiding this comment.
Suggest rich the comment to describe removing the stage parquet files because we keep them in the stage temporarily for downloading, so it need to remove 😄
588711c to
ae0b2a0
Compare
ae0b2a0 to
dc72b97
Compare
kokokuo
left a comment
There was a problem hiding this comment.
Thanks for fixing and update the commits, LGTM 👍 👍

Description
This pull request implemented the snowflake export function to export the selected data to the local dir in parquet format.
export flow:
Issue ticket number
issue #156
Additional Context
Result
Use parquet-cli to view the downloaded parquet file
