Skip to content

Convert CUDA JIT to use nvrtc instead of nvvm#1836

Merged
umar456 merged 4 commits intoarrayfire:develfrom
pavanky:nvrtc
Jun 20, 2017
Merged

Convert CUDA JIT to use nvrtc instead of nvvm#1836
umar456 merged 4 commits intoarrayfire:develfrom
pavanky:nvrtc

Conversation

@pavanky
Copy link
Copy Markdown
Member

@pavanky pavanky commented Jun 19, 2017

No description provided.

@pavanky pavanky changed the title [WIP]: Convert CUDA JIT to use nvrtc instead of nvvm Convert CUDA JIT to use nvrtc instead of nvvm Jun 19, 2017
Copy link
Copy Markdown
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I added a couple of minor comments.

{
Node_ptr node(reinterpret_cast<Node *>(new BufferNode<T>(irname<T>(), afShortName<T>())));
return node;
return Node_ptr(reinterpret_cast<Node *>(new BufferNode<T>(getFullName<T>(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure you can just do this:

return Node_ptr(new BufferNode<T>(getFullName<T>(), shortname<T>(true)));

@@ -3,10 +3,6 @@ CMAKE_MINIMUM_REQUIRED(VERSION 2.8)
FIND_PACKAGE(CUDA 7.0 REQUIRED)

INCLUDE(CLKernelToH)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change the name of this guy at some point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not doing it now :(

kerStream << std::setw(2) << std::setfill('0') << std::hex << ids.child_ids[0];
kerStream << std::setw(2) << std::setfill('0') << std::hex << ids.child_ids[1];
kerStream << std::setw(2) << std::setfill('0') << std::hex << ids.id << std::dec;
// Make the dec representation of enum part of the Kernel name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't templated so it would be better if they were implemented in the cpp file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do another pass about reorganizing the JIT nodes at a later point (including trying to use the same code for CUDA and OpenCL JIT). Can we leave this be for now?

std::shared_ptr<T> m_data;
Param<T> m_param;
unsigned m_bytes;
bool m_linear_buffer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since bool is most likely smaller than once_flag it would be better to place it after the m_set_data_flag variable. It would be good to look at the other variables while you are at it.


bool isBuffer() { return true; }

~BufferNode()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the destructor necessary? You aren't really doing anything in here.

"br i1 %cmp0, label %core, label %end\n";
const char *includeFileStr = jit_cuh;

const char paramTStr[] =""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using C++11 now, raw string literals would be easier to read.

} else {
for (int i = 0; i < 4; i++) args.push_back((void *)(strides + i));
for (int i = 0; i < 4; i++) args.push_back((void *)(dims + i));
args.push_back((void *)&outputs[i]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you know the size of the args vector. Could yo call the reserve function?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the size of the args vector. It depends on what each type of the node is.

NVRTC_CHECK(nvrtcCreateProgram(&prog, jit_ker.c_str(), ker_name, 0, NULL, NULL));
NVRTC_CHECK(nvrtcCompileProgram(prog, 0, NULL));
NVRTC_CHECK(nvrtcGetPTXSize(prog, ptx_size));
unique_ptr<char []> ptx(new char[*ptx_size]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use vector instead?

kerStream << triple32;
outParamStream << "Param<" << full_nodes[id]->getTypeStr() << "> out" << id << ", \n";
// Generate code to write the output
outWriteStream << "out" << id << ".ptr[idx] = " << "val" << id << ";\n";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outWriteStream << "out" << id << ".ptr[idx] = val" << id << ";\n";

#include <stdexcept>
#include <thread>
#include <vector>
#include <kernel_headers/jit.hpp>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you separate the standard headers from the others?

@umar456
Copy link
Copy Markdown
Member

umar456 commented Jun 20, 2017

Windows builds were failing because of a Windows macro definition. Fix #1839 should resolve this issue. Please rebase and test again.

Copy link
Copy Markdown
Contributor

@FloopCZ FloopCZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a lot of great code.

Most of my review comments are of the OCD type, so feel free to ignore if you don't suffer from this terrible disorder. ;-)

unique_ptr<char []> log(new char[logSize]); \
char *logptr = log.get(); \
nvrtcGetProgramLog(prog, logptr); \
logptr[logSize] = '\x0'; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this write is one-past-end.

template<typename T, bool binary> const std::string cuMangledName(const char *fn);
template<typename T > const char *afShortName(bool caps = true);
template<typename T > const char *irname();
template<typename T > const char *shortname(bool caps = true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - extra space after typename T

}
};


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra newline here and L65 and L60
(Nitpicking, but since the code looks so good and you already edit the line, it deserves to be perfect... ;-))

if (is_linear) {
kerStream << idx_str << " = idx;\n";
} else {
std::string info_str = std::string("in") + std::to_string(id);;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double semicolon

@@ -158,21 +123,6 @@ namespace JIT
bytes += m_bytes;
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra return statement

return "__c"#fn"f"; \
} \
}; \
\
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - newline in here, but not after any of the BinOps above

#define INSTANTIATE(T) \
template const std::string cuMangledName<T, false>(const char *fn); \
template const std::string cuMangledName<T, true>(const char *fn); \
template<typename T > const char *shortname(bool caps) { return caps ? "Q" : "q"; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - extra space after typename T (makes it not aligned)

double rhs_x = inv_rhs_abs * rhs.x;
double rhs_y = inv_rhs_abs * rhs.y;
cdouble out = {lhs.x * rhs_x + lhs.y * rhs_y,
lhs.y * rhs_x - lhs.x * rhs_y};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - prefix with one extra space to be aligned as L106

double rhs_x = inv_rhs_abs * rhs.x;
double rhs_y = inv_rhs_abs * rhs.y;
double2 out = {lhs.x * rhs_x + lhs.y * rhs_y,
lhs.y * rhs_x - lhs.x * rhs_y};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - prefix with one space to make it aligned as L75

// Common CUDA code
// This part of the code does not change with the kernel.

static const char *kernelVoid = "extern \"C\" __global__ void\n";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD - extra space after equal sign

@pavanky
Copy link
Copy Markdown
Member Author

pavanky commented Jun 20, 2017

@umar456 @FloopCZ Addressed all your feedback.

@umar456 umar456 merged commit b19c02a into arrayfire:devel Jun 20, 2017
@pavanky pavanky deleted the nvrtc branch June 21, 2017 01:42
@pavanky pavanky added this to the v3.5.0 milestone Jun 21, 2017
@pavanky pavanky mentioned this pull request Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants