Accelerate PyPose by replacing / add additional LieType implementation with NVIDIA Warp

## Summary

I propose accelerating pypose by replacing/adding a new set of implementations to functions in `pypose/lietensor/operation.py` with NVIDIA warp. Currently, they are implemented with PyTorch, and are relatively less efficient since no kernel fusion etc., is conducted.

By replacing the operations / providing an additional set of `LieType` implementations backed by warp, we can accelerate PyPose ops by 2-10x on both cpu and cuda.

## Improvements

Significant speedup, especially on complex operators like `AdjXa` and `Log`, lower latency for the robotics stack built with PyPose.

## Risks

* NVIDIA warp (in forseable future) will not support devices other than cpu and NVIDIA GPU.
* NVIDIA warp does not support bf16 datatype as of today, only fp16, fp32 and fp64 are supported.
* Since we will implement a kernel for each operator, we might not be able to support arbitrary dimension tensor input - however, we can support up to a reasonable number of batch dimensions by explicitly listing them out in the codebase (e.g. up to 4 batch dimensions).
* Launching warp kernels takes additional overhead, so on small input scenarios, this can make operations a bit slower.

## Involved components

LieType implementations.

## Preliminary results

I've conducted some preliminary experiments on this by implementing a `warp_SO3_Type` that inherits the `LieType` and replaces the operators (forward & backward) with warp functions.

While results on relative simple operators like `SO3_Act` are mixed, significant speedup can be found on `SO3_Log` and `SO3_AdjXa`. (All kernels have correct broadcasting with up to 4 batch dimensions, and the result is tested against the pypose result)

<img width="3147" height="1049" alt="Image" src="https://github.com/user-attachments/assets/40e42441-e892-4514-8281-a65388cfd44b" />

<img width="3146" height="1049" alt="Image" src="https://github.com/user-attachments/assets/2f6722d3-2101-49a6-b8fb-888c246d7760" />

<img width="3146" height="1049" alt="Image" src="https://github.com/user-attachments/assets/b9e85555-cbd9-432f-b590-3172f7b126ba" />

I'm currently using this interface to keep compatability with existing pypose code in my project:

```python
import warp as wp
import pypose as pp
from pypose.lietensor.lietensor import LieType

from .ltype import warpSO3_type
wp.init()


_BACKEND_LIST: list[tuple[LieType, LieType | None]] = [
    # (Pypose LieType, Warp LieType)
    (pp.SO3_type  , warpSO3_type),
    (pp.SE3_type  , None),
    (pp.Sim3_type , None),
    (pp.RxSO3_type, None)
]
_PP_TO_WP = {pp_ltype : wp_ltype for pp_ltype, wp_ltype in _BACKEND_LIST}
_WP_TO_PP = {wp_ltype : pp_ltype for pp_ltype, wp_ltype in _BACKEND_LIST}


def to_warp_backend(x: pp.LieTensor) -> pp.LieTensor:
    """Swap the lietensor backend for accelerated compute"""
    if is_warp_backend(x): return x
    wp_ltype = _PP_TO_WP[x.ltype]
    
    if wp_ltype is None:
        raise NotImplementedError(f"Warp backend not implemented for pypose LieType {x.ltype}.")
    
    return pp.LieTensor(x.tensor(), ltype=wp_ltype)


def to_pypose_backend(x: pp.LieTensor) -> pp.LieTensor:
    """Swap the lietensor backend for better op coverage"""
    if is_pypose_backend(x): return x
    return pp.LieTensor(x.tensor(), ltype=_WP_TO_PP[x.ltype])


def is_warp_backend(x: pp.LieTensor) -> bool:
    return x.ltype in {
        warpSO3_type
    }


def is_pypose_backend(x: pp.LieTensor) -> bool:
    return x.ltype in {
        pp.SE3_type, pp.SO3_type, pp.RxSO3_type, pp.Sim3_type
    }
```

## Optional: Intended side effects

-

## Optional: Missing test coverage

Additional unit tests to ensure the "warp backend" aligns with the original pypose behavior, both for shape and numeric results.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate PyPose by replacing / add additional LieType implementation with NVIDIA Warp #386

Summary

Improvements

Risks

Involved components

Preliminary results

Optional: Intended side effects

Optional: Missing test coverage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Accelerate PyPose by replacing / add additional LieType implementation with NVIDIA Warp #386

Description

Summary

Improvements

Risks

Involved components

Preliminary results

Optional: Intended side effects

Optional: Missing test coverage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions