Performance enhancements for bulk writes.#770
Conversation
- Added `getMany` and `repsertMany` for batched `get` and `repsert`. - Updated `insertEntityMany` to replace slow looped usage with batched execution.
- Upstreamed `updatePersistValue`, `mkUpdateText`, and `commaSeparated` from `Database.Persist.MySQL`. - De-duplicated `updatePersistValue` from various `Database.Persist.Sql.Orphan.*` modules.
- Updated SqlBackend to add ability to configure backend specific putMany. - Added a default `putMany` so `PersistUnique` doesn't have to demand all backends to implement it. - Updated all SqlBackends to implement `connPutManySql`. - Added basic test to verify insertion works.
Uniqueness diffing should be done on the unique keys rather than on the whole record.
- Removed the need for functor instance in `get`. - Removed the need for conditional import on `<>`. - Removed redundant `Map` include for NOSQL runs. - Removed publicly exposing `defaultPutMany` as we only require it for SqlBackend instance definition. - Updated `putMany` implementations to account for duplicates within the record set. also added test to verify "last" value takes precendence. - Extended `mkUpdateText'` to allow injection/override of reference column. `upsert` in postgres fails with ambiguos column reference error.
…for older preludes.
|
Is there a guage/criterion benchmark for before and after the patch ? |
|
Since the performance enhancements come from preventing additional network requests, would a criterion benchmark show that much? |
| -- | The _essence_ of a unique record. | ||
| -- useful for comaparing records in haskell land for uniqueness equality. | ||
| recordEssence :: PersistEntity record => record -> [PersistValue] | ||
| recordEssence r = concat $ map persistUniqueToValues $ persistUniqueKeys r No newline at end of file |
There was a problem hiding this comment.
Is there a better name for this? This seems like one of those function names where you'd always have to lookup the implementation to see what it does
There was a problem hiding this comment.
For sure, I can think of a few
persistUniqueKeyValuespersistValuesFromAllUniqueKeyspersistValuesAllUniqueKeys
Also, this function is entirely for internal use, it is not exposed via Database.Persist.Class.
|
Test failure appears to be random. |
|
@psibi Benchmark is now available at: https://github.com/naushadh/persistent-examples/tree/etl/stream-write benchmarking MyRecord/10
- time 232.4 μs (201.2 μs .. 281.1 μs)
+ time 69.22 μs (62.76 μs .. 80.15 μs)
benchmarking MyRecord/100
- time 4.187 ms (3.453 ms .. 6.545 ms)
+ time 518.7 μs (435.0 μs .. 654.4 μs)
benchmarking MyRecord/1000
- time 47.51 ms (35.15 ms .. NaN s)
+ time 12.98 ms (11.22 ms .. 20.11 ms)
benchmarking MyUniqueRecord/10
- time 311.8 μs (274.1 μs .. 372.5 μs)
+ time 42.35 μs (38.29 μs .. 49.32 μs)
benchmarking MyUniqueRecord/100
- time 6.699 ms (5.152 ms .. NaN s)
+ time 210.2 μs (184.5 μs .. 251.0 μs)
benchmarking MyUniqueRecord/1000
- time 66.15 ms (49.38 ms .. NaN s)
+ time 3.981 ms (3.169 ms .. 6.536 ms) |
|
@naushadh That looks amazing. Thanks! |
| putManySql :: EntityDef -> Int -> Text | ||
| putManySql entityDef' numRecords | ||
| | numRecords > 0 = q | ||
| | otherwise = error "putManySql: numRecords MUST be greater than 0!" |
There was a problem hiding this comment.
What do you think about making this a no-op if == 0? On the one hand inserting zero records could be a bug, on the other hand you might have code that dynamically changes the number of records inserted, and having zero be a no-op could be useful for that case. Thoughts?
There was a problem hiding this comment.
I agree with the no-op strategy. The users of connPutManySql: putMany and defaultPutMany already short circuit with a return () when given 0 records. putManySql is never to be used unless there are records to be inserted (because it'd generate an invalid sql query otherwise).
Resolved conflict with 'persistent/Database/Persist/Sql/Orphan/PersistStore.hs' by accepting mine. This is because `parseEntityValues` already exposes rich error messaging the incoming change was implemeting.
Conflicts: - persistent-postgresql/Database/Persist/Postgresql.hs - persistent/Database/Persist/Sql/Orphan/PersistStore.hs
|
Just following up on this. Anything additional needed of me? |
|
@naushadh Can you bump the cabal file for each package you've changed, and also update the corresponding changelog? Otherwise this is ready to merge 👍 |
See [pull/770](yesodweb#770) for more info.
|
Released as 2.8.1. Thanks for this @naushadh! |
|
Wicked! At your service yesodweb folks :) |
Motivation
Trying to make persistent capable of performing generic and/or efficient writes for various back-ends.
Reads are already efficient as it's based on a single call to the database. But many of the existing APIs for writes work on a single record/entity. With batch/list support we could make persistent more capable of supporting ETL type jobs, i.e., slap a conduit over a bulk update write function
CL.chunk .| CL.mapM someDbBulkWrite .| CL.concat.Summary of changes
Batching enhancements to reduce db round-trips.
getManyandrepsertManyfor batchedgetandrepsert.putManywith a default/slow implementation and native UPSERT for PGSQL and MYSQL.insertEntityManyto replace slow looped usage with batched execution.DRYed up several util functions into
Database.Persist.Sql.Util.Upstreamed
updatePersistValue,mkUpdateText, andcommaSeparatedfromDatabase.Persist.MySQL.De-duplicated
updatePersistValuefrom variousDatabase.Persist.Sql.Orphan.*modules.Added default implementations for all new APIs to be fully backward compatible.
Failed efforts
Upstreaming
insertManyOnDuplicateKeyUpdatefrom persistent-mysql. The API definition would become large enough to warrant it's own type before it's added as a field toSqlBackend.Make LTS-6 happy without making LTS-2 mad re:
Monoidimport. Import list is near identical tomasterwithout a few additions yet there are failures. Hence the use of{-# OPTIONS_GHC -fno-warn-unused-imports #-}based on prior suggestion.