Conversation
…nerally, no longer copy map
|
We're approaching a state where it's possible to say sth about the performance of this rocksdb-backed file storage. Both time and memory in case of very specific reads are greatly reduced. After parse/open, retrieval is only a little bit slower. There's probably a lot to optimise still. One major caveat is that as soon as instances are retrieved from the rocksdb model they are not freed for the lifetime of the file object because we do not want dangling pointers. This means that memory usage will keep growing and means we haven't really reached infinite scalability yet. Reading of DC_Riverside_Bldg-LOD_300.ifc (275M) and read wall guidsTime (s)
Memory (bytes)
scriptimport time, psutil, ifcopenshell
t0, m0 = time.perf_counter(), psutil.Process().memory_info().rss
f = ifcopenshell.open('DC_Riverside_Bldg-LOD_300.rdb')
t1, m1 = time.perf_counter(), psutil.Process().memory_info().rss
[i.GlobalId for i in f.by_type('IfcWall')]
t2, m2 = time.perf_counter(), psutil.Process().memory_info().rss
g = ifcopenshell.open('DC_Riverside_Bldg-LOD_300.ifc')
t3, m3 = time.perf_counter(), psutil.Process().memory_info().rss
[i.GlobalId for i in g.by_type('IfcWall')]
t4, m4 = time.perf_counter(), psutil.Process().memory_info().rss
print('spf', t3-t2, t4-t3, sep="|")
print('rocks', t1-t0, t2-t1, sep="|")
print('spf', m3-m2, m4-m3, sep="|")
print('rocks', m1-m0, m2-m1, sep="|") |
|
How would this look from the Python side? Would it be a drop in replacement with merely a tweak in the open/write? This is AMAZING! |
I don’t really know what I’m talking about so pardon me if this is a foolish idea, but could you use a shared_ptr like object instead of std::shared_ptr that hides if instance data is in memory or in the database? Instance data could be unloaded from memory after some time and then reloaded when needed. Objects having instances of the shared_ptr like object would never know. |
Yes, currently it just detects file type when you open. Writing is only possible through the serializer at this point. Attribute modifications are probably possible, but instance additions require a little bit more work because they need to be created through the file context to get a reference to the database.
I've been playing with similar ideas. Also the idea of a hybrid shared_ptr / weak_ptr that instances start as shared_ptr, when you add them to a file you get weak_references so that the file can be deleted when there are instances from the file alive outside of the file, which then can no longer be used. So maybe a custom pointer class is the solution. Also for equality. We use the pointer address currently in the C++ code base for testing equality, which essentially means we cannot recycle them (we already have the identity() actually that we could use instead). For a custom pointer class we could also define our own equality operator. It just feels like last resort to me, to implement your own smart pointer. |
|
🙈 |
One of the primary concerns when dealing with IFC is the memory usage. IfcOpenShell is not particularly careful in that regard, but it's likely something that affects the majority of libraries due to the entity-relationship model of Express. We see this for example in the buildingSMART validation service (uses ifcopenshell) where memory footprint (a) limits running multiple tasks in parallel (b) imposes a quite restrictive max file size on end-users and (c) still results in a hefty cloud bill.
One of the earliest open source implementations of IFC, the Open Source BIMserver in Java; https://github.com/opensourceBIM/BIMserver, has been using a key-value store since its inception to offload memory to disk and essentially support infinitely large models.
Recently within IfcOpenShell we started a similar effort to use RocksDB as an alternative side-by-side storage model.
Conceptually this appears really simple, where this gets harder is in the iterators and maps and realizing this in a backwards compatible fashion while still benefiting from the lazy on-disk opportunities that the rocksdb key value store would offer.
Therefore I've come up with a couple of new template classes
rocksdb_map_adapter<K, V>(rocksdb::DB*, Str prefix)(Mapping[K-> V]) Takes a rocksdb prefix and exposes a somewhat std::map compatible interface over that (de)serializes K and V into strings and writes/reads them from the key value store.map_transformer<Fn>(Mapping basis)(Mapping[basis::KeyType -> invoke_result<Fn, V>]) Primarily deals with persistence of pointers. Instances are stored by their identity in the KV. But a cache exist so that instance pointer addresses are stable. I.e they're created only once for the lifetime of the file, just like the in-memory situation.map_variant<Mappings...>(basis*)(common_type<Mappings...>) Unifies maps and iterators from the two storage backends into a consistent class that is std::variant-backed, both for the maps as the iterators.This allows us to build to have a std::unordered_map<std::string, Instance*> in
in_memory_file_storage.Have a lazy rocksdb-backed mapping interface
map_transformer<std::string, Instance*>(rocksdb_map_adapter("g|"))inrocksdb_file_storageAnd then in our File class we have
map_variant(in_memory_file_storage::by_guid_t, rocksdb_file_storage::by_guid_t). So this means that practically speaking even on the C++ side the code is mostly compatible.The way that this is built means that the full ifcopenshell stack will be able to use this, i.e IfcConvert, python, Bonsai, your own proprietary tools.
Roadmap:
I almost have the code base in a state again that it compiles after completely ripping it open.