Hue Discourse - Latest posts https://discourse.gethue.com Latest posts IMPORTANT! This forum has moved to https://github.com/cloudera/hue/discussions https://discourse.gethue.com/t/important-this-forum-has-moved-to-https-github-com-cloudera-hue-discussions/948/2 Wed, 13 Nov 2024 10:49:25 +0000 discourse.gethue.com-post-1483 IMPORTANT! This forum has moved to https://github.com/cloudera/hue/discussions Dear visitors, we are retiring this forum in favour of our github discussions page. All the topics are kept but are now in read-only mode. For new discussions and questions please use https://github.com/cloudera/hue/discussions

]]>
https://discourse.gethue.com/t/important-this-forum-has-moved-to-https-github-com-cloudera-hue-discussions/948/1 Wed, 13 Nov 2024 10:48:40 +0000 discourse.gethue.com-post-1482
Unable to Upload to S3 via IAM Role in Hue with has_iam_detection=true – Forbidden Error I am experiencing an issue with Hue’s integration with AWS IAM roles for accessing S3. I have configured the hue.ini file to detect and use the IAM role assigned to the service account in Kubernetes, but I receive a 403 Forbidden error specifically when attempting to upload a file to S3 via the Hue file browser.

Steps to Reproduce:

  1. Configure hue.ini as follows to use IAM role detection:
    [aws]
    [[aws_accounts]]
    [[[default]]]
    has_iam_detection=true

  2. Annotate the Kubernetes service account with the necessary roleArn for AWS IAM role usage.

  3. Confirm the role has s3:PutObject, s3:GetObject, and other necessary S3 permissions. (I verified this by successfully using the role for similar S3 operations in another application.)

  4. Open the Hue file browser, navigate to S3, and attempt to upload a file.

Observed Behavior:

While read and list operations work as expected with has_iam_detection=true, the file upload fails with a 403 Forbidden error:

aws.s3.s3fs.S3FileSystemException: Failed to access path "": User is not authorized to access path "".

When I provide AWS access keys and secrets directly in the hue.ini configuration, the upload works without issue. This suggests a potential difference in how Boto (or another underlying component) is configured to handle IAM role-based access compared to direct access keys.

Additional Details:

Role Verification: The IAM role used has been validated to have the necessary permissions (including s3:PutObject) in another application, and write operations work fine there.
Environment: Hue is running within a Kubernetes cluster, with the service account annotated with the IAM role.
Boto Library: Hue appears to be using Boto 2.x for S3 operations. This could be relevant since Boto 2.x and Boto3 handle IAM role-based access differently.
Hue Version: 4.11

Possible Areas to Investigate:

Boto IAM Role Support: Is there a known limitation with Boto 2.x for IAM role usage in Hue? If so, are there any recommended workarounds?
Configuration Differences: Are there any additional configuration steps needed to ensure full IAM role compatibility for upload operations in Hue?

Expected Behavior:

When using has_iam_detection=true, Hue should detect and utilize the IAM role for all S3 operations, including file uploads, without requiring access keys in the configuration file.

Logs:

Here are the relevant log entries from an upload attempt:

[06/Nov/2024 00:48:02 -0800] decorators ERROR Error running guess_format
...
aws.s3.s3fs.S3FileSystemException: Failed to access path "s3a://myBucket/...": User is not authorized to access path at "s3a://myBucket/...".

Request:

Could the maintainers provide insights or guidance on whether this is a known issue or if there are specific configurations needed to fully support IAM roles with has_iam_detection=true for S3 uploads? If there are any recommended versions of Boto or specific patches for this use case, please advise.

]]>
https://discourse.gethue.com/t/unable-to-upload-to-s3-via-iam-role-in-hue-with-has-iam-detection-true-forbidden-error/946/1 Fri, 08 Nov 2024 14:38:25 +0000 discourse.gethue.com-post-1480
Errors making 4.11.0 virtual env Hi, I’m trying to build Hue 4.11.0 from tarball after installing the dependencies. However, the makefile errors at the virtual environment:
[user@system hue-release-4.11.0]# PREFIX=/opt/hue make install
“PYTHON_VER is python3.9.”
“SYS_PYTHON is /usr/bin/python3.9.”
“ENV_PYTHON is /opt/hue/hue-release-4.11.0/build/env/bin/python3.9.”
“SYS_PIP is /usr/bin/pip3.9.”
“ENV_PIP is /opt/hue/hue-release-4.11.0/build/env/bin/python3.9 /opt/hue/hue-release-4.11.0/build/env/bin/pip.”
— Creating virtual environment at /opt/hue/hue-release-4.11.0/build/env
— Virtual environment /opt/hue/hue-release-4.11.0/build/env ready
touch: cannot touch ‘/opt/hue/hue-release-4.11.0/build/env/stamp’: No such file or directory
make: *** [Makefile:138: /opt/hue/hue-release-4.11.0/build/env/stamp] Error 1

The entire directory /opt/hue/hue-release-4.11.0/build doesn’t seem to get created. What mechanism is responsible for creating the virtual environment directories and how can this be resolved? I’ve tried manually creating the directory so the ‘stamp’ gets touched, but this results in errors being thrown in the resulting files.

]]>
https://discourse.gethue.com/t/errors-making-4-11-0-virtual-env/943/1 Tue, 29 Oct 2024 07:02:16 +0000 discourse.gethue.com-post-1477
Need Advice on Optimizing Query Performance in Hue Hey everyone,

I have been using Hue for a bit now, and I am really enjoying how it simplifies data querying and exploration. However, I have noticed that some of my queries are starting to take longer to execute, especially when working with larger datasets.

I have gone through these resources/articles Hue Performance Tuning Guide Mendix Platform Features and they are quite informative but seem too complicated so I want to learn from community.

I am wondering if anyone has tips or best practices for optimizing query performance in Hue? Are there specific configurations or settings I should be looking into? Also, if there are any tricks to better handle large datasets, I would love to hear them!

Appreciate any advice you can share. Thanks in advance!

Best Regards

]]>
https://discourse.gethue.com/t/need-advice-on-optimizing-query-performance-in-hue/938/1 Fri, 23 Aug 2024 10:56:33 +0000 discourse.gethue.com-post-1472
Error accessing database Hi,

Some times back I got the same issue but I not able to recall the process. Let me check it and get back to you soon till then you should try solution posted by @bjorn

Thanks

]]>
https://discourse.gethue.com/t/error-accessing-database/811/3 Thu, 22 Aug 2024 10:33:17 +0000 discourse.gethue.com-post-1471
Installation steps Hue Hi @sravani

Follow this- Installation steps Hue

Thanks

]]>
https://discourse.gethue.com/t/installation-steps-hue/853/4 Thu, 22 Aug 2024 10:31:03 +0000 discourse.gethue.com-post-1470
How to get time_column Hi,

Use this code-

SELECT
HOUR(date_) AS hour,
MINUTE(date_) AS minute,
SECOND(date_) AS second
FROM
your_table;

Try this and let me know if it’s working

Thanks

]]>
https://discourse.gethue.com/t/how-to-get-time-column/876/3 Thu, 22 Aug 2024 10:27:44 +0000 discourse.gethue.com-post-1469
Using Parameters in url Is it possible to implement an SQL Template and add the Parameters already in the url. Thus it could be bookmarked in the browser Or handed over from a „calling“ Web Site?
As an alternative would it be possible to add the complete SQL statement to the url?

]]>
https://discourse.gethue.com/t/using-parameters-in-url/934/1 Tue, 30 Jul 2024 15:15:13 +0000 discourse.gethue.com-post-1465
How to Optimize Hue for Large Scale Data Processing Hi everyone, :wave:

Our team has recently adopted Hue for data visualization and querying within our company’s extensive big data environment. While we’ve found the platform incredibly valuable, we’re encountering performance challenges as our data volume grows.

I’m reaching out to the community for expert advice on optimizing Hue for large-scale data processing. We’re particularly interested in:

  • Configuration best practices: Are there specific Hue settings to enhance performance with massive datasets?
  • Resource allocation: How can we effectively distribute resources (memory, CPU) for optimal Hue operation?
  • Query optimization techniques: What strategies can we employ to efficiently handle large-scale data queries?
  • Complementary tools: Are there any recommended integrations or tools to boost Hue’s performance?
  • We’re utilizing Hue on a Hadoop ecosystem for complex queries and interactive visualizations. Any insights, documentation, or real-world examples would be immensely helpful.

I also check this: https://discourse.gethue.com/t/hue-pyspark-connector-using-livy-how-to-change-spark-driver-memorlooker But I have not found any solution. Could anyone provide me the best solution for this?

Thank you for sharing!

Respected community member :smiling_face_with_three_hearts:

]]>
https://discourse.gethue.com/t/how-to-optimize-hue-for-large-scale-data-processing/932/1 Mon, 29 Jul 2024 17:43:18 +0000 discourse.gethue.com-post-1463
SQL AI Assistant Hi,

I think you should try to the Cloudera SQL AI Assistant integration with Hue is proprietary and while the assistant itself is not open source, the integration parts relevant to Hue are typically merged into the open source gethue repository. However some features or specific implementations related to Cloudera’s SQL AI Assistant may remain proprietary and not be included in the open-source codebase.

Thanks

]]>
https://discourse.gethue.com/t/sql-ai-assistant/910/3 Mon, 29 Jul 2024 06:40:16 +0000 discourse.gethue.com-post-1461
How to get time_column Hello @przemek :slightly_smiling_face:

If you want to date and time from the date column, you can use the built in function for the hour function; for the minute; and for the secound; imute use the secound function.
Try it and then let me know about the result :hugs:

]]>
https://discourse.gethue.com/t/how-to-get-time-column/876/2 Wed, 24 Jul 2024 10:50:14 +0000 discourse.gethue.com-post-1457
How to download multiple files from hue? Hello :smiling_face_with_three_hearts:

As per my knowledge, firstly you need to check Hue Version and then Inspect Logs, Browser Compatibility, Hue Configuration and at last Alternative Methods. I hope this will help you.

Respected community member :innocent:

]]>
https://discourse.gethue.com/t/how-to-download-multiple-files-from-hue/896/2 Mon, 22 Jul 2024 11:41:39 +0000 discourse.gethue.com-post-1453
Hue PermissionError / multiprocessing Issue During Initial Startup I figured out the problem! So hue is creating pymp-* directories under tmp which contain a socket file. The directory and the socket file are getting created with 700 and 755 permissions respectively. I found that these permissions need to be 750 and 775 in order to get the server to startup successfully.

I will note I am running hue as root because when i try to run as a dedicated hue user I am similarly running into a permission error when hue is trying to run a os.setgid call, which non-root users shouldnt have permissions to do.

I’m looking into where I can tweak the permissions for these /tmp files, but if anyone notices any mistakes I’m making that would cause this problem please let me know. Thanks!

]]>
https://discourse.gethue.com/t/hue-permissionerror-multiprocessing-issue-during-initial-startup/923/2 Thu, 18 Jul 2024 15:26:56 +0000 discourse.gethue.com-post-1452
Hue PermissionError / multiprocessing Issue During Initial Startup Hey folks,

I have been trying to get Hue running on RHEL9 against python 3.9. I am currently running into the following error when starting the server with build/env/bin/supervisor:

Environment:

Request Method: GET
Request URL: http://localhost:8888/hue/accounts/login

Django Version: 3.2.25
Python Version: 3.9.18
Installed Applications:
[‘django.contrib.auth’,
‘django.contrib.contenttypes’,
‘django.contrib.sessions’,
‘django.contrib.sites’,
‘django.contrib.staticfiles’,
‘django_extensions’,
‘django_babel’,
‘desktop’,
‘axes’,
‘webpack_loader’,
‘django_prometheus’,
‘crequest’,
‘rest_framework’,
‘rest_framework.authtoken’,
‘drf_spectacular’,
‘drf_spectacular_sidecar’,
‘indexer’,
‘metadata’,
‘notebook’,
‘dashboard’,
‘kafka’,
‘about’,
‘beeswax’,
‘filebrowser’,
‘help’,
‘hive’,
‘jobbrowser’,
‘jobsub’,
‘metastore’,
‘oozie’,
‘proxy’,
‘spark’,
‘useradmin’,
‘zookeeper’,
‘corsheaders’]
Installed Middleware:
[‘corsheaders.middleware.CorsMiddleware’,
‘desktop.middleware.MetricsMiddleware’,
‘desktop.middleware.EnsureSafeMethodMiddleware’,
‘desktop.middleware.AuditLoggingMiddleware’,
‘desktop.middleware.MultipleProxyMiddleware’,
‘django.middleware.common.CommonMiddleware’,
‘django.contrib.sessions.middleware.SessionMiddleware’,
‘django.contrib.auth.middleware.AuthenticationMiddleware’,
‘desktop.middleware.ProxyMiddleware’,
‘desktop.middleware.SpnegoMiddleware’,
‘desktop.middleware.HueRemoteUserMiddleware’,
‘django.middleware.locale.LocaleMiddleware’,
‘django_babel.middleware.LocaleMiddleware’,
‘desktop.middleware.AjaxMiddleware’,
‘django.middleware.security.SecurityMiddleware’,
‘django.middleware.clickjacking.XFrameOptionsMiddleware’,
‘desktop.middleware.ContentSecurityPolicyMiddleware’,
‘desktop.middleware.LoginAndPermissionMiddleware’,
‘django.contrib.messages.middleware.MessageMiddleware’,
‘desktop.middleware.NotificationMiddleware’,
‘desktop.middleware.ExceptionMiddleware’,
‘desktop.middleware.ClusterMiddleware’,
‘django.middleware.csrf.CsrfViewMiddleware’,
‘desktop.middleware.CacheControlMiddleware’,
‘django.middleware.http.ConditionalGetMiddleware’,
‘desktop.middleware.MimeTypeJSFileFixStreamingMiddleware’,
‘crequest.middleware.CrequestMiddleware’,
‘desktop.middleware.EnsureSafeRedirectURLMiddleware’,
‘useradmin.middleware.LastActivityMiddleware’,
‘axes.middleware.AxesMiddleware’]

Traceback (most recent call last):
File “/usr/lib64/python3.9/multiprocessing/managers.py”, line 802, in _callmethod
conn = self._tls.connection

During handling of the above exception (‘ForkAwareLocal’ object has no attribute ‘connection’), another exception occurred:
File “/usr/gmdp/current/hue/build/env/lib/python3.9/site-packages/django/core/handlers/exception.py”, line 47, in inner
response = get_response(request)
File “/usr/gmdp/current/hue/build/env/lib/python3.9/site-packages/django/utils/deprecation.py”, line 116, in call
response = self.process_request(request)
File “/usr/gmdp/current/hue/desktop/core/src/desktop/middleware.py”, line 888, in process_request
global_registry().update_metrics_shared_data()
File “/usr/gmdp/current/hue/desktop/core/src/desktop/lib/metrics/registry.py”, line 128, in update_metrics_shared_data
self._metrics_dict[os.getpid()] = metrics
File “”, line 2, in setitem

File “/usr/lib64/python3.9/multiprocessing/managers.py”, line 806, in _callmethod
self._connect()
File “/usr/lib64/python3.9/multiprocessing/managers.py”, line 793, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File “/usr/lib64/python3.9/multiprocessing/connection.py”, line 506, in Client
c = SocketClient(address)
File “/usr/lib64/python3.9/multiprocessing/connection.py”, line 634, in SocketClient
s.connect(address)

Exception Type: PermissionError at /hue/accounts/login
Exception Value: [Errno 13] Permission denied

I’m using the /usr/gmdp/current/hue installation location because im trying to adapt an old mpack I found so the service can be managed by Ambari, i plan on installing under /usr/local/hue in the future. I’ve been stuck for a couple weeks with this one, so thought I’d post here. Any ideas?

]]>
https://discourse.gethue.com/t/hue-permissionerror-multiprocessing-issue-during-initial-startup/923/1 Wed, 17 Jul 2024 18:48:26 +0000 discourse.gethue.com-post-1450
RHEL8 Compatibility? env:
RHEL-8.8
python-2.7.18

Error Log

[root@cve1 hue]# /usr/lib/hue/build/env/bin/pip2.7 install mysqlclient
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Requirement already satisfied: mysqlclient in /usr/lib64/python2.7/site-packages (1.4.6)
[root@cve1 hue]# cd /usr/lib/hue/build/env
[root@cve1 env]# bin/hue syncdb
/usr/lib/hue/build/env/lib/python2.7/site-packages/requests_kerberos-0.12.0-py2.7.egg/requests_kerberos/kerberos_.py:11: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.
  from cryptography import x509
[18/Apr/2024 14:52:54 +0000] settings     DEBUG    DESKTOP_DB_TEST_NAME SET: /usr/lib/hue/desktop/desktop-test.db
[18/Apr/2024 14:52:54 +0000] settings     DEBUG    DESKTOP_DB_TEST_USER SET: hue_test
Traceback (most recent call last):
  File "bin/hue", line 14, in <module>
    load_entry_point('desktop', 'console_scripts', 'hue')()
  File "/usr/lib/hue/desktop/core/src/desktop/manage_entry.py", line 239, in entry
    raise e
django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: libmysqlclient.so.18: cannot open shared object file: No such file or directory.
Did you install mysqlclient or MySQL-python?
[root@cve1 env]# /usr/lib/hue/build/env/bin/pip install mysql-python
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
Requirement already satisfied: mysql-python in ./lib/python2.7/site-packages/MySQL_python-1.2.5-py2.7-linux-x86_64.egg (1.2.5)

]]>
https://discourse.gethue.com/t/rhel8-compatibility/90/3 Fri, 19 Apr 2024 05:06:28 +0000 discourse.gethue.com-post-1441
SQL AI Assistant Hi, the Cloudera SQL AI Assistant is only available for Cloudera’s enterprise customers and there are currently no plans for making it available to the open source community.

]]>
https://discourse.gethue.com/t/sql-ai-assistant/910/2 Tue, 02 Apr 2024 10:24:18 +0000 discourse.gethue.com-post-1439
Missing Statement Field in Query History for API-Executed Queries in Hue I’ve encountered an odd behavior with Hue and was hoping someone could help clarify. When executing queries through the Hue GUI, everything works as expected, and I can retrieve the query history via the API, including the statement field. However, when I initiate queries directly through the API, the statement field seems to be missing from the query history data returned by the get_history API. Am I overlooking something, or is there a specific reason for this discrepancy? Any insights would be greatly appreciated.

]]>
https://discourse.gethue.com/t/missing-statement-field-in-query-history-for-api-executed-queries-in-hue/913/1 Tue, 02 Apr 2024 07:32:00 +0000 discourse.gethue.com-post-1438
SQL AI Assistant Hello,

As i understand Cloudera SQL AI Assistant is integrated with Hue (https://blog.cloudera.com/setting-up-and-getting-started-with-clouderas-new-sql-ai-assistant/).
Is support merged to opensource gethue repository?
Probably assistant itself is proprietary, but integration part is in gethue?

]]>
https://discourse.gethue.com/t/sql-ai-assistant/910/1 Mon, 25 Mar 2024 02:22:37 +0000 discourse.gethue.com-post-1435