Resolve pyspark / numpy conflicts#992
Conversation
Signed-off-by: Jun Ki Min <[email protected]>
|
I am having concern to do version pin in Feathr. Its okay to do version pin in Registry as its a standalone app. But for feathr it's a library and normally used in an environment with other python libraries in same context. Introduce a version pin might introduces the risk for package installation error or incompatible issues with other python packages. I actually already experienced some issue when numpy was pin earlier
And seems this is already fixed in Spark apache/spark#37817, probably we can just set a min version for pyspeak instead? |
|
@xiaoyongzhu @blrchen I verified that unless we explicitly call sparkDF.toPandas() on the dataframe that include boolean type features, we can avoid the pyspark's bug. Once pyspark has new release, let's change the pyspark dependency to Until then, we can stick with the current version. I'll put the comment on our setup.py instead of pinning numpy. |
…e notebooks Signed-off-by: Jun Ki Min <[email protected]>
Signed-off-by: Jun Ki Min <[email protected]>


Signed-off-by: Jun Ki Min [email protected]
Description
Pyspark is still relying on the old numpy api, referring
np.boolthat has been deprecated.Because of that, when calling sparkDF.toPandas(), it throws
AttributeError: module 'numpy' has no attribute 'bool'.We have to upgrade pyspark version when they release new patch.
How was this PR tested?
Does this PR introduce any user-facing changes?