Use 32-bit addressing in take1 when possible#166
Conversation
… to do it in 64 bits Also use the new GpuKernel_setarg option to avoid allocating a buffer for the arguments.
src/gpuarray_array.c
Outdated
| size_t argp; | ||
| GpuKernel k; | ||
| unsigned int j; | ||
| unsigned int _n[2], _o; |
There was a problem hiding this comment.
Where this is used? I don't see it being used.
There was a problem hiding this comment.
Leftover from previous code.
|
I finished my review |
|
I should have fixed all the problems. |
|
Running Not sure how to get the full information. I'm running the benchmarks to check the performance. |
|
This is probably just that the GPU is used. those tests don't handle that On Tue, May 10, 2016 at 4:18 PM, Pascal Lamblin [email protected]
|
|
Yes, I did that, I actually ran |
|
It seem a bug, @abergeron will check. |
|
Strange error when running the benchmark: |
|
Problem should be fixed. |
|
The code seems to run. I still have to relaunch with profiling and compare. |
|
Updated timings with that change below (total time spent in that Op, in seconds).
|
|
should we merge this? I think what @abergeron told about making GpuJoin reuse elemwise to get all the speed benefit would be useful here too. If so, this would make this PR useless. But if we merge now, we will have now some good speed up. So if people absolutly need float16, they won't have too much slow down. I vote to merge now. |
|
I did not review the code and I'm not sure I understand how the other solution would interact with this one. I'm OK with merging if you think it is a good idea. |
|
Btw, the overall performance was similar to the old back-end, due to Gemm being faster. |
|
I reviewed it complety and I think it is good to merge. This could use the GpuElemwise kernel to make a copy for each element, but mark them as being able to run in parallel. But it would be probably better to implement the dimensions collapsing here. There is function that can be reused and this would make only 1 kernel call. |
|
I need to leave, When I can I'll run gpuarray and Theano tests to make sure it work well and I'll merge. |
|
I can run the tests. |
|
I'm running them. On Thu, May 12, 2016 at 1:38 PM, Pascal Lamblin [email protected]
|
|
Me too. The gpuarray tests passed, I'm running the Theano ones now. |
|
The Theano one passed, I'm finishing the gpuarray ones. so merging. On Thu, May 12, 2016 at 4:05 PM, Pascal Lamblin [email protected]
|
This be faster and doesn't change the public interface.