Tags: DavidChan0519/iree
Tags
[LLVMGPU][NFC] Unify hal.interface conversion for ROCM and CUDA (iree… …-org#7568)
Implementation of Util::AlignOp with tests and integration into compi… …ler passes (iree-org#7437) Adds a Util.Align op to `UtilOps.td`. Align accepts two arguments, the value to align and the alignment to return the newly aligned value. `--iree-hal-pack-allocations` pass produces `Util::Align` ops instead of arithmetic ops for the alignment. The `--iree-vm-conversion` pass concerts the alignment ops directly to VM arithmetic/const ops (bypassing the arithmetic ops altogether). iree-org#5405
Adding -iree-stream-schedule-execution + -concurrency passes. (iree-o… …rg#7549) The passes themselves are rather simple and call into a partitioning routine that performs the real work with the intent being that we can have many and specify which one to use based on scoped attributes in the IR (kind of like lowering configs in codegen). Today there's just a reference implementation that does a single level of concurrency. The hope is that someone who actually knows how to write a good partitioning algorithm can contribute something better, but it's at least no worse than what we have today and better than simple ML systems that have no concurrency. Though the passes are similar they operate at different scopes and will have different partitioning algorithms. I thought about trying to unify them however keeping them separate allows us to do things like use a more complex execution partitioning pass while using the same generic concurrency scheduling etc - including disabling the concurrency scheduling entirely for debugging or environments where there may be no benefits to such scheduling (single core execution, etc). It's easy enough to reason about how they could be unified that I wanted to err on the side of flexibility until we have an owner and at least one or two more algorithms we can use to feel out the shape of things. A benefit of the independent execution and concurrency partitioning is that debugging either is much simpler (and there's pretty good `-debug` output). Since the concurrency scheduling operates only within the scheduled execution regions there's no need to worry about host/device interactions or the parent op CFG.
[pydm] Implement sufficient support to run a couple of types of fibon… …acci (iree-org#7301) * Not yet compliant with anything we want but minimally works for default integer sizes (the VM seems to have issues with fp). * Required also building out initial support for tuples and lists in order to get proper support for multiple returns (used for promotion RTL helpers). * Adds while loop. * Various fixes.
[pydm] Implement sufficient support to run a couple of types of fibon… …acci (iree-org#7301) * Not yet compliant with anything we want but minimally works for default integer sizes (the VM seems to have issues with fp). * Required also building out initial support for tuples and lists in order to get proper support for multiple returns (used for promotion RTL helpers). * Adds while loop. * Various fixes.
[pydm] Implement sufficient support to run a couple of types of fibon… …acci (iree-org#7301) * Not yet compliant with anything we want but minimally works for default integer sizes (the VM seems to have issues with fp). * Required also building out initial support for tuples and lists in order to get proper support for multiple returns (used for promotion RTL helpers). * Adds while loop. * Various fixes.
NFC: improve naming and doc for tiled and distributed loop info (iree… …-org#7558) The logic behind rediscovering the loop tiling and distribution information is quite dense. This makes at least the API clearer.
Adding -iree-stream-refine-usage pass. (iree-org#7537) This adds the resource usage analysis pass using DFX to solve for usage across a whole module and a pass that applies that analyzed usage information back into the types.
Adding -iree-stream-refine-usage pass. (iree-org#7537) This adds the resource usage analysis pass using DFX to solve for usage across a whole module and a pass that applies that analyzed usage information back into the types.
Adding -iree-stream-refine-usage pass. (iree-org#7537) This adds the resource usage analysis pass using DFX to solve for usage across a whole module and a pass that applies that analyzed usage information back into the types.
PreviousNext