Improve peformances put/get rows using pandas.DataFrame by dangtrungtin · Pull Request #19 · griddb/python_client

dangtrungtin · 2020-09-09T10:17:15Z

Add function : void Container.put_rows(pandas.DataFrame input). In python layer, I convert input from pandas.DataFrame to numpy.array because NumPy support C-API. In C++ layer, I use API from NumPy to put data into GridDB. Compare with using Container.multi_puts(input : list[list]) to put large data with LONG type, the time to run reduce about 11% and memory using reduce 20%. With String type, the time to run reduce about 10% and memory using reduce about 8%.
Add function : pandas.DataFrame RowSet.fetch_rows(). In C++ layer, it uses Iterable object (RowList.h/cpp) to wrap output data. In python layer, I convert data from Iterable object to pandas.DataFrame. Compare with using RowSet.next() to query large data with LONG type, the time to run reduce 17% and memory using are the same. When query large data with STRING type, the time to run reduce 11%.
Reduce call function to check NULL field: when get data, for each row field, Python Client is using gsGetRowFieldNull() then gsGetRowFieldAsXXX(). I change to use gsGetRowFieldAsXXX(), then if data is empty or null then I use gsGetRowFieldNull() to check whether field is null.
There is a note for Container.put_rows(). To create DataFrame, we use: "frame = pandas.DataFrame(data)" with "data" is list. However, when list has None value, Pandas library will automatic change value, for example None value to NaN value. To prevent this, in python code should use "frame = pandas.DataFrame(data, dtype=object)".

- Add function: void Container.put_rows(input: pandas.DataFrame) - Add function: pandas.DataFrame RowSet.fetch_rows() - Reduce call function to check NULL field

knonomura · 2020-09-10T01:48:13Z

Oh, that's great !
I'll try to use it.

I have a question.
How much data did you use?

Thanks.

dangtrungtin · 2020-09-10T10:19:26Z

With long type, I put 1000 rows x 10000 fields. With string type, I use 1000 rows x 7552 fields.

knonomura · 2020-09-11T11:45:33Z

Thank you for your information.
I understand.

knonomura · 2020-09-15T03:11:10Z

I have a request.
Could you please add a sample for new function ?

- PutRowsWithDataFrame.py : sample for put rows. - FetchRowsWithDataFrame.py : sample for fetch rows.

dangtrungtin · 2020-09-16T04:43:46Z

I added 2 samples:

PutRowsWithDataFrame.py : sample for put rows.
FetchRowsWithDataFrame.py : sample for fetch rows.

knonomura · 2020-09-16T08:41:25Z

Thank you for your samples.
I'll check them.

knonomura · 2020-09-21T04:54:56Z

I guess this pull request is very useful.
So I merge it.
Thank you.

Improve performance put/get rows using pandas.DataFrame

37297ad

- Add function: void Container.put_rows(input: pandas.DataFrame) - Add function: pandas.DataFrame RowSet.fetch_rows() - Reduce call function to check NULL field

Add sample for put/fetch rows with DataFrame.

fee8818

- PutRowsWithDataFrame.py : sample for put rows. - FetchRowsWithDataFrame.py : sample for fetch rows.

knonomura merged commit eda9482 into griddb:master Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve peformances put/get rows using pandas.DataFrame#19

Improve peformances put/get rows using pandas.DataFrame#19
knonomura merged 2 commits intogriddb:masterfrom
dangtrungtin:master

dangtrungtin commented Sep 9, 2020

Uh oh!

knonomura commented Sep 10, 2020

Uh oh!

dangtrungtin commented Sep 10, 2020

Uh oh!

knonomura commented Sep 11, 2020

Uh oh!

knonomura commented Sep 15, 2020

Uh oh!

dangtrungtin commented Sep 16, 2020

Uh oh!

knonomura commented Sep 16, 2020

Uh oh!

knonomura commented Sep 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dangtrungtin commented Sep 9, 2020

Uh oh!

knonomura commented Sep 10, 2020

Uh oh!

dangtrungtin commented Sep 10, 2020

Uh oh!

knonomura commented Sep 11, 2020

Uh oh!

knonomura commented Sep 15, 2020

Uh oh!

dangtrungtin commented Sep 16, 2020

Uh oh!

knonomura commented Sep 16, 2020

Uh oh!

knonomura commented Sep 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants