Performance enhancements for bulk writes. by naushadh · Pull Request #770 · yesodweb/persistent

naushadh · 2018-01-06T20:57:43Z

Motivation

Trying to make persistent capable of performing generic and/or efficient writes for various back-ends.

Reads are already efficient as it's based on a single call to the database. But many of the existing APIs for writes work on a single record/entity. With batch/list support we could make persistent more capable of supporting ETL type jobs, i.e., slap a conduit over a bulk update write function CL.chunk .| CL.mapM someDbBulkWrite .| CL.concat.

Summary of changes

Batching enhancements to reduce db round-trips.
- Added getMany and repsertMany for batched get and repsert.
- Added putMany with a default/slow implementation and native UPSERT for PGSQL and MYSQL.
- Updated insertEntityMany to replace slow looped usage with batched execution.
DRYed up several util functions into Database.Persist.Sql.Util.
Upstreamed updatePersistValue, mkUpdateText, and commaSeparated from Database.Persist.MySQL.
De-duplicated updatePersistValue from various Database.Persist.Sql.Orphan.* modules.
Added default implementations for all new APIs to be fully backward compatible.

Failed efforts

Upstreaming insertManyOnDuplicateKeyUpdate from persistent-mysql. The API definition would become large enough to warrant it's own type before it's added as a field to SqlBackend.
Make LTS-6 happy without making LTS-2 mad re: Monoid import. Import list is near identical to master without a few additions yet there are failures. Hence the use of {-# OPTIONS_GHC -fno-warn-unused-imports #-} based on prior suggestion.

- Added `getMany` and `repsertMany` for batched `get` and `repsert`. - Updated `insertEntityMany` to replace slow looped usage with batched execution.

- Upstreamed `updatePersistValue`, `mkUpdateText`, and `commaSeparated` from `Database.Persist.MySQL`. - De-duplicated `updatePersistValue` from various `Database.Persist.Sql.Orphan.*` modules.

- Updated SqlBackend to add ability to configure backend specific putMany. - Added a default `putMany` so `PersistUnique` doesn't have to demand all backends to implement it. - Updated all SqlBackends to implement `connPutManySql`. - Added basic test to verify insertion works.

Uniqueness diffing should be done on the unique keys rather than on the whole record.

- Removed the need for functor instance in `get`. - Removed the need for conditional import on `<>`. - Removed redundant `Map` include for NOSQL runs. - Removed publicly exposing `defaultPutMany` as we only require it for SqlBackend instance definition. - Updated `putMany` implementations to account for duplicates within the record set. also added test to verify "last" value takes precendence. - Extended `mkUpdateText'` to allow injection/override of reference column. `upsert` in postgres fails with ambiguos column reference error.

…for older preludes.

psibi · 2018-01-07T07:42:25Z

Is there a guage/criterion benchmark for before and after the patch ?

MaxGabriel · 2018-01-07T20:04:28Z

Since the performance enhancements come from preventing additional network requests, would a criterion benchmark show that much?

MaxGabriel · 2018-01-07T20:09:12Z

persistent/Database/Persist/Class/PersistUnique.hs

+-- | The _essence_ of a unique record.
+-- useful for comaparing records in haskell land for uniqueness equality.
+recordEssence :: PersistEntity record => record -> [PersistValue]
+recordEssence r = concat $ map persistUniqueToValues $ persistUniqueKeys r


Is there a better name for this? This seems like one of those function names where you'd always have to lookup the implementation to see what it does

For sure, I can think of a few

persistUniqueKeyValues

persistValuesFromAllUniqueKeys

persistValuesAllUniqueKeys

Also, this function is entirely for internal use, it is not exposed via Database.Persist.Class.

naushadh · 2018-01-13T23:19:30Z

Test failure appears to be random.
Same commit on branch runs ok after rerun: https://travis-ci.org/naushadh/persistent/builds/328570723

naushadh · 2018-01-14T04:31:03Z

@psibi Benchmark is now available at: https://github.com/naushadh/persistent-examples/tree/etl/stream-write
TLDC: double digit fold performance gain using batching (new APIs) over looping (existing APIs).

benchmarking MyRecord/10
- time                 232.4 μs   (201.2 μs .. 281.1 μs)
+ time                 69.22 μs   (62.76 μs .. 80.15 μs)

benchmarking MyRecord/100
- time                 4.187 ms   (3.453 ms .. 6.545 ms)
+ time                 518.7 μs   (435.0 μs .. 654.4 μs)

benchmarking MyRecord/1000
- time                 47.51 ms   (35.15 ms .. NaN s)
+ time                 12.98 ms   (11.22 ms .. 20.11 ms)

benchmarking MyUniqueRecord/10
- time                 311.8 μs   (274.1 μs .. 372.5 μs)
+ time                 42.35 μs   (38.29 μs .. 49.32 μs)

benchmarking MyUniqueRecord/100
- time                 6.699 ms   (5.152 ms .. NaN s)
+ time                 210.2 μs   (184.5 μs .. 251.0 μs)

benchmarking MyUniqueRecord/1000
- time                 66.15 ms   (49.38 ms .. NaN s)
+ time                 3.981 ms   (3.169 ms .. 6.536 ms)

psibi · 2018-01-14T08:02:54Z

@naushadh That looks amazing. Thanks!

MaxGabriel · 2018-01-17T18:27:44Z

persistent-mysql/Database/Persist/MySQL.hs

+putManySql :: EntityDef -> Int -> Text
+putManySql entityDef' numRecords
+  | numRecords > 0 = q
+  | otherwise = error "putManySql: numRecords MUST be greater than 0!"


What do you think about making this a no-op if == 0? On the one hand inserting zero records could be a bug, on the other hand you might have code that dynamically changes the number of records inserted, and having zero be a no-op could be useful for that case. Thoughts?

I agree with the no-op strategy. The users of connPutManySql: putMany and defaultPutMany already short circuit with a return () when given 0 records. putManySql is never to be used unless there are records to be inserted (because it'd generate an invalid sql query otherwise).

Resolved conflict with 'persistent/Database/Persist/Sql/Orphan/PersistStore.hs' by accepting mine. This is because `parseEntityValues` already exposes rich error messaging the incoming change was implemeting.

Conflicts: - persistent-postgresql/Database/Persist/Postgresql.hs - persistent/Database/Persist/Sql/Orphan/PersistStore.hs

naushadh · 2018-02-04T20:34:10Z

Just following up on this. Anything additional needed of me?

MaxGabriel · 2018-02-04T20:59:31Z

@naushadh Can you bump the cabal file for each package you've changed, and also update the corresponding changelog? Otherwise this is ready to merge 👍

See [pull/770](yesodweb#770) for more info.

MaxGabriel · 2018-02-04T23:18:04Z

Released as 2.8.1. Thanks for this @naushadh!

naushadh · 2018-02-04T23:20:28Z

Wicked! At your service yesodweb folks :)

naushadh added 17 commits November 20, 2017 09:00

PersistStore batching enhacements to reduce db round-trips.

c3b4ae6

- Added `getMany` and `repsertMany` for batched `get` and `repsert`. - Updated `insertEntityMany` to replace slow looped usage with batched execution.

Updated doc for insertEntityMany to reflect sql backend batching.

569b877

Add doc for SqlBackend.connUpsertSql.

1fa3ff2

DRYed up several util functions into Database.Persist.Sql.

3e899cc

- Upstreamed `updatePersistValue`, `mkUpdateText`, and `commaSeparated` from `Database.Persist.MySQL`. - De-duplicated `updatePersistValue` from various `Database.Persist.Sql.Orphan.*` modules.

Merge branch 'master' into etl2

626863c

Added test for put and fixed unique diffing.

a87fb87

Uniqueness diffing should be done on the unique keys rather than on the whole record.

Replace pure with return for older Prelude.

c530448

Merge branch 'master' into etl2.

58f21a4

Added cleanup to prevent subsequent test errors.

bb5c6eb

Merge branch 'master' into etl2

f9ca3ca

Attempt to address older LTS pedantic compile warnings.

734da7f

Attempt to address older LTS pedantic compile warnings.

2773f86

Attempt to address older LTS pedantic compile warnings.

ccacde1

Removed unused import.

18736ef

Suppress unused import warning for Monoid module as it is required …

0b73bd1

…for older preludes.

MaxGabriel reviewed Jan 7, 2018

View reviewed changes

Renamed 'recordEssence' to 'persistUniqueKeyValues' for clarity.

aed32ab

MaxGabriel reviewed Jan 17, 2018

View reviewed changes

naushadh added 4 commits January 17, 2018 21:38

Merge master into etl2.

429db7c

Resolved conflict with 'persistent/Database/Persist/Sql/Orphan/PersistStore.hs' by accepting mine. This is because `parseEntityValues` already exposes rich error messaging the incoming change was implemeting.

Merge master into etl2.

81425c0

Merged master into etl2.

5865b99

Conflicts: - persistent-postgresql/Database/Persist/Postgresql.hs - persistent/Database/Persist/Sql/Orphan/PersistStore.hs

Fixed merge conflict changes.

1d1262f

Updated various changelogs and cabal files for batching enhancements.

40acff6

See [pull/770](yesodweb#770) for more info.

MaxGabriel merged commit a4e95f1 into yesodweb:master Feb 4, 2018

naushadh mentioned this pull request Feb 25, 2018

upsertBy doesn't updates existing record #782

Closed

naushadh mentioned this pull request Mar 12, 2018

[WIP] Refactor upsert, repsert, upsertBy and create repsertBy #785

Closed

11 tasks

naushadh deleted the etl2 branch December 29, 2022 00:11

Conversation

naushadh commented Jan 6, 2018

Motivation

Summary of changes

Failed efforts

Uh oh!

psibi commented Jan 7, 2018

Uh oh!

MaxGabriel commented Jan 7, 2018

Uh oh!

MaxGabriel Jan 7, 2018

Choose a reason for hiding this comment

Uh oh!

naushadh Jan 8, 2018

Choose a reason for hiding this comment

Uh oh!

naushadh commented Jan 13, 2018

Uh oh!

naushadh commented Jan 14, 2018

Uh oh!

psibi commented Jan 14, 2018

Uh oh!

MaxGabriel Jan 17, 2018

Choose a reason for hiding this comment

Uh oh!

naushadh Jan 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

naushadh commented Feb 4, 2018

Uh oh!

MaxGabriel commented Feb 4, 2018

Uh oh!

MaxGabriel commented Feb 4, 2018

Uh oh!

naushadh commented Feb 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

naushadh Jan 18, 2018 •

edited

Loading