ARROW-11004: [FlightRPC][Python] Header-based auth in clients#8959
ARROW-11004: [FlightRPC][Python] Header-based auth in clients#8959tifflhl wants to merge 14 commits intoapache:masterfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
| return "" | ||
|
|
||
|
|
||
| class HeaderAuthServerMiddlewareFactory(ServerMiddlewareFactory): |
There was a problem hiding this comment.
This class should not be part of test_flight.py to be used in production. It should also be changed to have some API for checking username/password/bearer rather than being hard coded.
Unless this PR is really for header-based auth in just clients, in which case the description is not really correct.
There was a problem hiding this comment.
This PR is only for the client implementation of header based auth in Python, same as the C++ implementation done here: #8724. I've created a new subtask in ARROW-10486 that reflects this scope of work: https://issues.apache.org/jira/browse/ARROW-11004.
python/pyarrow/_flight.pyx
Outdated
|
|
||
| def authenticateBasicToken(self, username, password, | ||
| options: FlightCallOptions = None): | ||
| """Authenticate to the server with header token authentication. |
There was a problem hiding this comment.
It's not really header token authentication, it's HTTP basic authentication
There was a problem hiding this comment.
Addressed in new commit.
| CStatus Authenticate(CFlightCallOptions& options, | ||
| unique_ptr[CClientAuthHandler] auth_handler) | ||
|
|
||
| # TODO: Add AuthenticateBasicToken |
There was a problem hiding this comment.
Addressed in new commit.
|
|
||
| def is_valid(self, token): | ||
| """Do nothing.""" | ||
| return "" |
There was a problem hiding this comment.
Better to return None than empty string?
There was a problem hiding this comment.
Returning None gives the error Flight RPC failed with Python exception "TypeError: expected bytes, NoneType found". Based on the method's documentation here, returning an empty string is allowed, I added a comment to clarify the behaviour.
python/pyarrow/tests/test_flight.py
Outdated
| """Validates incoming username and password.""" | ||
|
|
||
| def start_call(self, info, headers): | ||
| auth_header = headers.get('authorization') |
There was a problem hiding this comment.
This is test code, but this should be a case insensitive lookup.
There was a problem hiding this comment.
Addressed in new commit.
| self.token = token | ||
|
|
||
| def sending_headers(self): | ||
| return {'authorization': 'Bearer ' + self.token} |
There was a problem hiding this comment.
We had been using Authorization (title case) in other languages.
There was a problem hiding this comment.
It would seem the C++ grpc impl does not allow capitals in key, when using capital A, the header was rejected due to https://github.com/grpc/grpc/blob/b1df40104c1720bde3b22ded451a037f11b5dc54/src/core/lib/surface/validate_metadata.cc#L78
There was a problem hiding this comment.
Should that be a comment for the headers parameter?
There was a problem hiding this comment.
Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case
| auth_header = headers.get('authorization') | ||
| values = auth_header.split(' ') | ||
| return [values[1].encode("utf-8")] | ||
| raise flight.FlightUnauthenticatedError( |
There was a problem hiding this comment.
This seems like it's an internal server error.
There was a problem hiding this comment.
Addressed in new commit.
| c_string pw = tobytes(password) | ||
|
|
||
| with nogil: | ||
| result = self.client.get().AuthenticateBasicToken(deref(c_options), |
There was a problem hiding this comment.
Shouldn't there be a FlightCallOption implementation to do this? How do you use HTTP basic auth without using authenticateBasicToken? (eg in Java you can directly call getFlightInfo without calling an authenticate function and supply a FlightCallOption with the basic auth info).
There was a problem hiding this comment.
Based on how the C++ code was implemented , the Python wrapper can only wrap the AuthenticateBasicToken method in the C++ client. Seems like we would need to expand the C++ client implementation if we were to have C++/Python impl match the Java client implementation. #8724 -> C++ impl only exposed authenticateBasicToken.
|
I added a test example implementation called ClientHeaderAuthMiddleware, it intercepts headers from a HTTP header auth enabled server and make it accessible to the code using the FlightClient. A new test also demonstrates how to use the ClientMiddleware to intercept an Authorization header returned from the server and to use it in subsequent calls. |
python/pyarrow/_flight.pyx
Outdated
| self.client.get().Authenticate(deref(c_options), | ||
| move(handler))) | ||
|
|
||
| def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
Do you need to add a description for self in the list of parameters below?
There was a problem hiding this comment.
self is never documented - it's implicitly passed by Python (it's equivalent to this in Java et al)
| @@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
| reader.poison() | |||
There was a problem hiding this comment.
Is this a typo "poison"?
There was a problem hiding this comment.
This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.
|
|
||
| CResult[pair[c_string, c_string]] AuthenticateBasicToken( | ||
| CFlightCallOptions& options, | ||
| const c_string& username, |
There was a problem hiding this comment.
For the uninitiated, what are the coding standards on Class & variable names? I see a mixture of camel case and underscore separated names.
There was a problem hiding this comment.
The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)
| self.factory = factory | ||
|
|
||
| def received_headers(self, headers): | ||
| auth_header = case_insensitive_header_lookup(headers, 'Authorization') |
There was a problem hiding this comment.
There seems to be a mix of single and double quotes for strings. Should these be consistent?
There was a problem hiding this comment.
The two are interchangeable.
python/pyarrow/tests/test_flight.py
Outdated
| decoded = base64.b64decode(values[1]) | ||
| pair = decoded.decode("utf-8").split(':') | ||
| if not (pair[0] == 'test' and pair[1] == 'password'): | ||
| raise flight.FlightUnauthenticatedError('Invalid credentials') |
There was a problem hiding this comment.
3 uses of the string 'Invalid credentials'. Should this be made a constant?
python/pyarrow/_flight.pyx
Outdated
| self.client.get().Authenticate(deref(c_options), | ||
| move(handler))) | ||
|
|
||
| def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
Let's follow Python naming conventions - this should use snake_case.
python/pyarrow/_flight.pyx
Outdated
| IPC write options. The default options can be controlled | ||
| by environment variables (see pyarrow.ipc). | ||
|
|
||
| headers : vector[pair[c_string, c_string]], optional |
There was a problem hiding this comment.
This type hint should use Python conventions (e.g. List[Tuple[str, str]])
python/pyarrow/_flight.pyx
Outdated
|
|
||
| Returns | ||
| ------- | ||
| pair : pair[string, string] |
There was a problem hiding this comment.
Same for the type hints here - use str and Tuple
|
|
||
| CResult[pair[c_string, c_string]] AuthenticateBasicToken( | ||
| CFlightCallOptions& options, | ||
| const c_string& username, |
There was a problem hiding this comment.
The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)
| return "" | ||
|
|
||
|
|
||
| def case_insensitive_header_lookup(headers, lookup_key): |
There was a problem hiding this comment.
This is specifically to extract an authentication header based n the exception it raises, let's make sure the name reflects that.
There was a problem hiding this comment.
I modified the function to not raise an exception when the header is not found. The method is used in a test not related to authentication as well.
| self.token = token | ||
|
|
||
| def sending_headers(self): | ||
| return {'authorization': 'Bearer ' + self.token} |
There was a problem hiding this comment.
Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case
python/pyarrow/tests/test_flight.py
Outdated
| client = FlightClient(('localhost', server.port)) | ||
| token_pair = client.authenticateBasicToken(b'test', b'password') | ||
| assert token_pair[0] == b'authorization' | ||
| assert token_pair[1] == b'Bearer ' + b'token1234' |
There was a problem hiding this comment.
nit: why concat here? (and above)
| @@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
| reader.poison() | |||
| @@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
| reader.poison() | |||
There was a problem hiding this comment.
This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.
python/pyarrow/_flight.pyx
Outdated
| self.client.get().Authenticate(deref(c_options), | ||
| move(handler))) | ||
|
|
||
| def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
self is never documented - it's implicitly passed by Python (it's equivalent to this in Java et al)
| c_string pw = tobytes(password) | ||
|
|
||
| with nogil: | ||
| result = self.client.get().AuthenticateBasicToken(deref(c_options), |
|
@lidavidm yes these changes are sufficient. |
No description provided.