Skip to content

[Answer Query Using Materialized Views] Support ORDER BY in origin query#358

Merged
avamingli merged 1 commit intoapache:mainfrom
avamingli:order_by_aqumv
Jan 10, 2024
Merged

[Answer Query Using Materialized Views] Support ORDER BY in origin query#358
avamingli merged 1 commit intoapache:mainfrom
avamingli:order_by_aqumv

Conversation

@avamingli
Copy link
Contributor

ORDER BY clause could refer result column, result number, table column or expression has table columns.
All that could be processed by aqumv_process_targetlist()
if ORDER BY clause could be computed from materialized views.

begin;
create table aqumv_t5(c1 int, c2 int, c3 int) distributed by (c1);
insert into aqumv_t5 select i, i+1, i+2 from generate_series(1, 100) i;
insert into aqumv_t5 values (91, NULL, 95);
analyze aqumv_t5;
create incremental materialized view aqumv_mvt5_0 as
  select c1 as mc1, c2 as mc2, c3 as mc3
  from aqumv_t5 where c1 > 90;

set local enable_answer_query_using_materialized_views = on;
explain(costs off, verbose)
select c1, c3 from aqumv_t5 where c1 > 90 order by c2 - c1 - 1 asc;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: mc1, mc3, (((mc2 - mc1) - 1))
   Merge Key: (((mc2 - mc1) - 1))
   ->  Sort
         Output: mc1, mc3, (((mc2 - mc1) - 1))
         Sort Key: (((aqumv_mvt5_0.mc2 - aqumv_mvt5_0.mc1) - 1))
         ->  Seq Scan on public.aqumv_mvt5_0
               Output: mc1, mc3, ((mc2 - mc1) - 1)
 Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off'
 Optimizer: Postgres query optimizer
(10 rows)

See more examples in test cases.

Authored-by: Zhang Mingli [email protected]

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

ORDER BY clause could refer result column, result number,
table column or expression has table columns.
All that could be processed by aqumv_process_targetlist()
if ORDER BY clause could be computed from materialized views.

Authored-by: Zhang Mingli [email protected]
@avamingli avamingli self-assigned this Jan 8, 2024
@avamingli avamingli changed the title [AQUMV] Support ORDER BY in origin query. [Answer Query Using Materialized Views] Support ORDER BY in origin query. Jan 8, 2024
@avamingli avamingli changed the title [Answer Query Using Materialized Views] Support ORDER BY in origin query. [Answer Query Using Materialized Views] Support ORDER BY in origin query Jan 8, 2024
Copy link
Member

@yjhjstz yjhjstz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@avamingli
Copy link
Contributor Author

Pushed, thanks for review.

@avamingli avamingli merged commit d70f664 into apache:main Jan 10, 2024
@avamingli avamingli deleted the order_by_aqumv branch January 10, 2024 03:40
tuhaihe pushed a commit to tuhaihe/cloudberrydb that referenced this pull request Jan 26, 2026
tuhaihe pushed a commit to tuhaihe/cloudberrydb that referenced this pull request Feb 26, 2026
tuhaihe pushed a commit to tuhaihe/cloudberrydb that referenced this pull request Feb 27, 2026
reshke pushed a commit that referenced this pull request Mar 11, 2026
reshke pushed a commit to reshke/cloudberrydb that referenced this pull request Mar 11, 2026
reshke pushed a commit to reshke/cloudberrydb that referenced this pull request Mar 12, 2026
leborchuk added a commit to open-gpdb/cloudberry that referenced this pull request Mar 12, 2026
* Print the diff files when the tests failed. (#109)

* Fix flaky test cases. NFC. (#110)

* Fix bug: table's relation_cache_entry is not be removed after vacuum full (#112)

* Fix bug
The relation_cache_entry of temporary table, created during vacuum full, is not be removed after vacuum full. This table will be treated as an uncommitted table, although it has been dropped after vacuum full. And its table size still remains in diskquota.table_size, which causes quota size to be larger than real status.
Use RelidByRelfilenode() to check whether the table is committed, and remove its relation_cache_entry.

Co-authored-by: hzhang2 <[email protected]>
Co-authored-by: Xing Guo <[email protected]>
Co-authored-by: Xuebin Su (苏学斌) <[email protected]>

* Fix test case bug (#114)

The SQL statement is not equal to the expected output in test_vacuum.sql.

Co-authored-by: hzhang2 <[email protected]>

* Fix bug: can not calculate the size of pg_aoblkdir_xxxx before `create index on ao_table` is committed (#113)

We can not calculate the size of pg_aoblkdir_xxxx before `create index on ao_table` is committed.
1. Lack of the ability to parse the name of pg_aoblkdir_xxxx.
2. pg_aoblkdir_xxxx is created by `create index on ao_table`, which can not be searched by diskquota_get_appendonly_aux_oid_list() before index's creation.
Solution:
1. parse the name begin with `pg_aoblkdir`.
2. When blkdirrelid is missing, we try to fetch it by traversing relation_cache.

Co-authored-by: hzhang2 <[email protected]>
Co-authored-by: Xing Guo <[email protected]>
Co-authored-by: Xuebin Su (苏学斌) <[email protected]>

* Add UDF to show worker status (#111)

Co-authored-by: Xuebin Su (苏学斌) <[email protected]>
Co-authored-by: Xing Guo <[email protected]>
Co-authored-by: Hao Zhang <[email protected]>

* Wait until deleted tuples are dead before VACUUM (#118)

Consider a user session that does a DELETE followed by a VACUUM FULL to
reclaim the disk space. If, at the same time, the bgworker loads config
by doing a SELECT, and the SELECT begins before the DELETE ends, while ends
after the VACUUM FULL begins:

bgw: ---------[ SELECT ]----------->
usr: ---[ DELETE ]-[ VACUUM FULL ]-->

then the tuples deleted will be marked as RECENTLY_DEAD instead of DEAD.
As a result, the deleted tuples cannot be removed by VACUUM FULL.

The fix lets the user session wait for the bgworker to finish the current
SELECT before starting VACUUM FULL .

* Make altered relation's oid active when performing 'VACUUM FULL'. (#116)

When doing VACUUM FULL, The table size may not be updated if
the table's oid is pulled before its relfilenode is swapped.

This fix keeps the table's oid in the shared memory if the table
is being altered, i.e., is locked in ACCESS EXCLUSIVE mode.

Co-authored-by: Xuebin Su <[email protected]>

* Refactor pause() to skip refreshing quota (#119)

Currently, diskquota.pause() only takes effect on quota checking.
Bgworkers still go over the loop to refreshing quota even if diskquota
is paused. This wastes computation resources and can cause flaky issues.

This fix makes bgworkers skip refreshing quota when the user pauses
diskquota entirely to avoid those issues. Table sizes can be updated
correctly after resume.

* ci: create rhel8 release build. (#117)

ci: create rhel8 release build.

Signed-off-by: Sasasu <[email protected]>
Co-authored-by: Xuebin Su <[email protected]>

* set the scope of pause() to current database

* cleanup header include

* re-dispatch pause status

* allow to re-create extension

* use atomic on flag hardlimit_lock (#124)

* send a error if launcher crashed (#124)

* Pause before DROP EXTENSION (#121)

Currently, deadlock can occur when

1. A user session is doing DROP EXTENSION, and
2. A bgworker is loading quota configs using SPI.

This patch fixes the issue by pausing diskquota before DROP
EXTENSION so that the bgworker will not load config anymore.

Note that this cannot be done using object_access_hook() because
the extension object is dropped AFTER dropping all tables that belong
to the extension.

* add test case for pause hardlimit (#128)

* Attempt to fix flaky test_primary_failure (#132)

Test case test_primary_failure will stop/start segment to produce a
mirror switch. But the segment start could fail while replaying xlog.
The failure was caused by the deleted tablespace directories in previous
test cases.

This commit removes the "rm" statement in those tablespace test cases and
add "-p" to the "mkdir" command line. The corresponding sub-directories
will be deleted by "DROP TABLESPACE" if the case passes.

Relevant logs:
2022-02-08 10:09:30.458183 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","entering standby mode",,,,,,,0,,"xlog.c",6537,
2022-02-08 10:09:30.458670 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","redo starts at E/24638A28",,,,,,,0,,"xlog.c",7153,
2022-02-08 10:09:30.468323 CST,"cc","postgres",p1182588,th1235613568,"[local]",,2022-02-08 10:09:30 CST,0,,,seg1,,,,,"FATAL","57P03","the database system is starting up"
,"last replayed record at E/2481EA70",,,,,,0,,"postmaster.c",2552,
2022-02-08 10:09:30.484792 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"FATAL","58P01","directory ""/tmp/test_spc"" does not exist",,"Create this directory for the table
space before restarting the server.",,,"xlog redo create tablespace: 2590660 ""/tmp/test_spc""",,0,,"tablespace.c",749,

* ereportif uses the 1st param as the condition (#131)

Otherwise, compiler reports a warning:
"comparison of constant ‘20’ with boolean expression is always false"

* add diskquota.status() to show if hardlimit or softlimit enabled or not

* ci: fix flaky test

* Add timeout when waiting for workers (#130)

Each time the state of Diskquota is changed, we need to wait for the
change to take effect using diskquota.wait_for_worker_new_epoch().
However, when the bgworker is not alive, such wait can last forever.

This patch fixes the issue by adding a timeout GUC so that wait() will
throw an NOTICE if it times out, making it more user-friendly.

To fix a race condition when CREATE EXTENSION, the user needs to
SELECT wait_for_worker_new_epoch() manually before writing data.
This is to wait until the current database is added to the monitored
db cache so that active tables in the current database can be
recorded.

This patch also fix test script for activating standby and rename
some of the cases to make them more clear.

* Add git commit message template (#141)

The template is copied from
https://gist.github.com/lisawolderiksen/a7b99d94c92c6671181611be1641c733

* Use guc for hardlimit (#142)

- Change to use GUC to set hardlimit instead of UDF. Since the hardlimit
  setting needs to be persistent after postmaster restarting.
- Fix relevant test cases.

* be nice with scheduler when naptime = 0 (#120)

* Create worker entry before starting worker (#144)

Currently, the Diskquota launcher first starts a worker, then creates
the worker entry. However, after the worker starts, it cannot find the
entry when trying to check the is_paused status. Also, after a GPDB
restart, when the QD checks whether the worker is running by
checking the epoch, it might also fail to find the entry.

This patch fixes the issue by first create the worker entry then
starting the bgworker process.

* Extend fetch_table_stat() to update db cache (#138)

The db cache stores which databases enables diskquota. Active tables
will be recorded only if they are in those databases. Previously,
we created a new UDF update_diskquota_db_list() to add the current db
to the cache. However, the UDF is install in a wrong database. As a
result, after the user upgrade from a previous version to 1.0.3, the
bgworker does not find the UDF and can do nothing.

This patch fixes the issue by removing update_diskquota_db_list() and
using fetch_table_stat() to update db cache. fetch_table_stat() already
exists since version 1.0.0 so that no new UDF is needed.

This PR is to replace PR #99 , and depends on PR #130 to fix a race
condition that occurs after CREATE EXTENSION.

* Use ytt to create pipelines (#143)

- Rewrite the pipelines by using ytt for code reuse.
- Add `fly.sh` for easier manipulating with pipelines.
- Create configs for pr, commit and dev pipelines.

- Fix the test task for rhel8.
- Use build/test images pair for all distros. And modify the
  `build_diskquota.sh` so it won't need `Python.h` during the configure
  stage.
- Fix the pipiline bug which a `ubuntu` test may use a `rhel6` diskquota
  binary to do the test. That is caused by the same name of the task
  output.

* Specify DISTRIBUTED BY when CREATE TABLE (#146)

When the distribution policy is not specified explicitly, ORCA
distributes data randomly when CREATE TABLE. This can make the
size of the tables created different from time to time.

This patch fixes the issue by adding DISTRIBUTED BY for each
CREATE TABLE to avoid random distribution.

* Simplify the build script (#148)

No need to have the switch case.

* Revert greenplum-db/diskquota#138 (#150)

Revert "Extend fetch_table_stat() to update db cache (#138)"

This reverts commit 9457fa52f1df6d1715041c551fb5eab3cb0fb0b3.

* Add clang-format and editorconfig (#136)

- clang-format is from https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/rHb4DjSd1iI/m/FKPbqwYcBAAJ
- See editorconfig at https://editorconfig.org/

* Extend fetch_table_stat() to update db cache v2 (#151)

The db cache stores which databases enables diskquota. Active tables
will be recorded only if they are in those databases. Previously,
we created a new UDF update_diskquota_db_list() to add the current db
to the cache. However, the UDF is install in a wrong database. As a
result, after the user upgrade from a previous version to 1.0.3, the
bgworker does not find the UDF and can do nothing.

This patch fixes the issue by removing update_diskquota_db_list() and
using fetch_table_stat() to update db cache. fetch_table_stat() already
exists since version 1.0.0 so that no new UDF is needed.

This PR is a revision of PR #138 , and depends on #130 to fix a race
condition that occurs after CREATE EXTENSION.

Co-authored-by: Chen Mulong <[email protected]>

* Wait after CREATE EXTENSION (#152)

Each time after CREATE EXTENSION, we need to wait until the db cache
gets updated. Otherwise no active table will be recorded.

* Run tests on all platforms for PR pipeline (#153)

Try to catch more flaky tests from the beginning.

Co-authored-by: Chen Mulong <[email protected]>
Co-authored-by: Xuebin Su (苏学斌) <[email protected]>

* Reset naptime when test finish (#147)

Add tear_down to regress and isolation2 tests to reset the naptime. When
naptime is 0, quite a lot of cpu will be used. It will be a problem if
there are failing jobs on CI before getting reaped by the concourse. The
concourse worker CPU will be occupied which may impact following tests.

* Fix flaky test case in isolation2/test_relation_size.sql (#123)

* Set PR pending status when start (#157)

Also fixed a syntax issue in pr.yml

* Update CI README for PR trigger issue (#156)

* Make diskquota work for "pg_default" tablespace (#133)

* Make diskquota work for "pg_default" tablespace (#133)

Tracker story ID: #180688138.

Setting role_tablespace quota in "pg_default" using
`set_role_tablespace_quota` doesn't work. Due to a hack
(src/backend/catalog/heap.c) in GPDB, the `reltablespace` of relations
in "pg_default" is `InvalidOid`. When we refresh blackmap we use
`DEFAULTTABLESPACE_OID` instead for those invalid tablespace, this
results in a mismatch. This patch solves the problem by updating tablespace
oid to the real tablespace oid (`MyDatabaseTableSpace`) when reading
from syscache.

Besides, with the hard limits on, we also need to update the tablespace
oid to the real one.

Co-authored-by: Xuebin Su <[email protected]>
Co-authored-by: Sasasu <[email protected]>

* build: use cmake as build system (#161)

* rename library file to diskquota-<major.minor>.so (#162)

rename library file to diskquota-<major.minor>.so, this will break
the old gpconfig -v shared_preload_libraries command, please note.

centos6 and centos7 only have cmake 2.8.12, use cmake 2.8.10 as the
minimum version.

use PG_HOME and remove the workaround.

use cpack to create the installer.

clean up SQL define.

* Keep the 1.0 sql the same with 1.x branch (#163)

The added 'DISTRIBUTED BY' clause should have no impact on 1.x.

* Test when table is created before quota is set (#129)

* Test when table is created before quota is set

This patch adds test cases to ensure that diskquota works when table
is created before quota is set. Specifically,

- When diskquota is preloaded, quota usage is updated automatically
for each active table. Therefore, quotas work once they are set and no
more action is required from the user.
 - When diskquota is NOT preloaded, no active tables will be recorded.
 As a result, when diskquota is preloaded again, the user need to
 SELECT diskquota.init_table_size_table(); and restarts GPDB manually
 in order to update the quota usage.

 This patch also fixes some minor issues in the existing cases.

* Replace wait() with sleep()

* Install the latest cmake on CI (#165)

We need the archive extract feature from cmake 3.18.
Before the new version landing on the test/build images, install them in
our ci scripts.
A cmake binary has been uploaded to the concourse gcs for faster
downloading.

* dump cmake version to 3.18

* fix case test_ctas_no_preload_lib

* Cache CMAKE_INSTALL_PREFIX (#171)

CMAKE_INSTALL_PREFIX needs to be set as CACHE FORCE. Otherwise, next
time call 'make install' without source greenplum_path.sh, it will be
reset to '/usr/local'

See the example of CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT
https://cmake.org/cmake/help/latest/variable/CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT.html

* Fix empty quota view when no table (#167)

Currently, quota views are implemented using INNER JOIN. As a result,
when there is no table belong to a role, schema, or tablespace, the
quota view is empty even though there is a quota config defined for
that role, schema, or tablespace.

This patch fixes this by re-implementing quota views using OUTER JOIN.
Plus, the implementation also uses CTEs for modularizing big queries
and aggregation pushdown.

* mc/cmake installcheck (#169)

CMake installcheck:

- Regress.cmake is introduced to add some helper functions which can
  easily create regress/isolation2 test targets in the CMakeLists.
- When creating regress targets, cmake will generate a series of
  diff_<target>_<test> target to show result diff on a specific target.
- Porting the isolation2_regress build from Makefile to cmake.
- Moved the magic trap regress diff from the concourse script to cmake.
  If the test target is called with env `SHOW_REGRESS_DIFF=1`, then the
  regress.diff will be printed when the case fails.
- 'sql'/`expected` will be linked to the cmake binary directory and run
  test from there. So the `results` won't pollute the source directory.

Concourse scripts restructure:

This commit also try to explore the way how the concourse scripts
are structured. Currently, the container is started with root user, and
this cannot be changed in a short time. But we require gpadmin user
basically for everything.
An `entry.sh` is introduced as a simple skeleton for the scripts. It
make sure a gpadmin user is created and everything needed can be
accessed by the gpadmin with a permanent path. So we don't have too much
ENV magic in the task script any more. Most of the setup work should be
done in the `entry.sh` and the task script should only do a few simple
steps with absolute paths.
By doing this, when we hijack to the concourse container, login with
gpadmin user, the environment should be ready. Simple copy & paste from
the task script will just work.

`upgrade_extension.sh` may not be needed anymore, and it won't work. It
will be cleaned up later when we have the full implementation of
upgrade.

* Dispatch only once for all tables when init() (#107)

Add UDF to accelerate init_table_size_table(). In this UDF:

we traverse pg_class to calculate the relation size of each relation;
parse the auxiliary table's name to get the relevant primary table's oid;
add the size of the auxiliary table to the primary table's size.
It can avoid multiple relation_open() and dispatching pg_table_size().

Co-authored-by: Xuebin Su <[email protected]>

* Pack the last released so in the pipeline (#172)

* Pack the last released so in the pipeline

Due to our upgrade design, the newly released diskquota should contain
all the so file of every latest minor release. The packaging system has
been implemented by cmake DISKQUOTA_PREVIOUS_INSTALLER.
This commit get the latest release through gcs regexp and pass it to the
build script.

The pipeline has been split into different jobs based on the OS type
because of:
- Easier to use the common resource name for build_diskquota.yml
- job_def.lib.yml becomes shorter
- Can span jobs into different concourse workers, which may bring some
  better performance.
- Running a flaky test again is easier.

Also, a gate job has been added which is the main entrance for all
pipelines. In the future, we may want to have different gate jobs for
different pipelines, like a clang-format checking gate for the PR
pipeline.

* add upgrade and downgrade test (#166)

* Fix gcc warning (#175)

Reported by gcc 11.2.0:

quotamodel.c:2115:25: warning: ‘strncpy’ output truncated before
terminating nul copying 10 bytes from a string of the same length
[-Wstringop-truncation]
 2115 |                         strncpy(targettype_str, "ROLE_QUOTA", 10);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* Remove Makefiles (#174)

Everything has been moved to cmake.

* cmake: also link RESULTS_DIR to working dir

* show current binary and schema version

* Upload every build to GCS and add build-info (#177)

- Upload every build to GCS
- BuildInfo_create cmake function to print some cmake vars into a
  build-info file which can be packaged. We avoid to use `cmake -LAH`
  here since that misses some important local cmake vars and most of the
  cmake cached var are not cared by us.
- Output some info to diskquota-build-info and it will be packaged by
  cpack.
- Match the whole GP_VERSION from pg_config.h in Gpdb.cmake
- Modify editorconfig to use 2 spaces for cmake.
- Add Git.cmake to retrieve information from a git repository.

* Use cmake way to add C macro defines (#178)

Otherwise it creates noise with LSP since the '-Dxxxx' is started with a
'\n\t'. And that cannot be parsed by ccls.

* ci: speed up the test

* Small tuning on logs (#154)

- Add logs when launcher/bgworker terminates
- Add missing information to the logs.

* Update gitignore (#180)

- The build objects should always exist in build* directories now.
- Add ignore pattern for idea, LSP and dinosaur developpers.

* Add test cases for postmaster restart (#135)

- When postmaster dies, the launcher and workers should be terminated.
- When postmaster restarts, they should be restarted as well.
- Move common ignored patterns to the top init file.

* Fix memory leak when refresh_blackmap() (#182)

The ever growing part of memory contains the following pattern:

refresh_blackmap.*<segment_name><ip_address>:<port> pid=<pid>

The pattern contains info of segments. This indicates that the results
of dispatching refresh_blackmap() might not be freed properly.

* Format the code and add clang-format to PR pipeline (#183)

- Use clang 13 to format the code
- Rename the job 'gate' to 'entrance', prepare to add 'exit' job
- Add clang-format as an entrance check into the PR pipeline

* remove the in-logic version check.

After https://github.com/greenplum-db/diskquota/pull/166, we do not allow one binary running on two
or more DLL which have different version. By the new method, we will block worker enter normal
status if the DDL version is not match.

NOTE: launcher don't has the version check, if launcher DDL is modified,
will be undefined behavior.

this commit also remove some mem-alloc by using static const char* in
SQL string.

* ci: add upgrade version naming convention check

* Support release build (#186)

- Introduce macro DISKQUOTA_DEBUG through CMAKE_BUILD_TYPE. For the
  debug build, check the macro and allow naptime = 0. The min naptime
  for release build is 1 to prevent high cpu usage in user's
  environment.
- Add EXCLUDE arg to Regree.cmake to skip some tests for the release
  build since the release GPDB build doesn't support fault injector.
- Use GPDB release candidate to compile diskquota release. The release
  candidate has assertion disabled in the pg_config.h.
- Set naptime=0 in the config.sql won't take effect for diskquota
  release since it is less than the min value. So the naptime stays the
  same (normally the default value, 2). This fails some of the test,
  especially the hardlimit related ones. If the insertion takes less
  time than naptime (2 seconds, normally), it fails since the limitation
  has not been dispatched before finish. Thus, the naptime is set to 2
  seconds instead of 10 in the reset_config.

* feat: github actions for clang-check ci (#185)

* Feat: use GitHub Actions for clang-check `CI`

* Run same regress test multiple times (#188)

- Add RUN_TIMES to RegressTarget_Add to run the same regress test
  multiple times.
- Rename script for consistency.

* slack-alert and PR pipeline fix (#187)

- Add a slack alert when commit pipeline fails
- Add exit job to set successful state. That state cannot be set by
  individual job since at that point, the whole pipeline is not finished
  yet.

* Don't format errno as '%d' in error messages. (#190)

Actually, errno *isn't* strictly an integer. It depends on your
operating system. Instead of formatting it as an integer, we should
transform it to C style string and format it as '%s'.

* fix context corrupted when dirty data exist in pg_extension

* Fix cmake build on Mac (#191)

- Mac needs extra linker options that the symbols in the postgres
  executable can be found.
- Mac's grep doesn't have '-P' option.
- To make a extension so file, we use a 'module' target instead of
  'shared' on purpose. Since "A SHARED library may be marked with
  the FRAMEWORK target property to create an macOS Framework."
  Thus, the global link flags should be set through
  CMAKE_MODULE_LINKER_FLAGS than CMAK_SHARED_LINKER_FLAGS.

* Bring tests back on CI (#193)

Some tests were accidentally excluded in Debug build.

* Missing EmitErrorReport before FlushErrorState (#155)

FlushErrorState() will clear the error state. To get the error log in
server logs, EmitErrorReport() needs to be called before flushing. All
other places with FlushErrorState() have the Emit before in the
codebase.

For diskquota bgworker, use `SearchSysCache1`. Because the bgworker
should be error-tolerant to keep it running in the background, we
can't throw an error. If we `EmitErrorReport` in bgworker, then the
results returned by `CdbDispatchCommand` would be incorrect. On the
other hand, the diskquota launcher can throw an error if needed.

Co-authored-by: Chen Mulong <[email protected]>
Co-authored-by: Xuebin Su <[email protected]>

* Insert newline before `else` (#194)

Insert newline before `else` to keep aligned with gpdb repository.

* Add the ytt based release pipeline (#198)

The release pipeline will:
- Automatically make a release build for every commit.
- The final release step needs to be started manually.
  - Then the build will be pushed to the release bucket.
  - And a tag will be pushed to the repository.

* Use on/off in diskquota.status() (#200)

PostgreSQL uses on/off for all bool variables.

soft limits: on (soft will works) / off (soft limit not works)
hard limits: on (hard will works) / off (hard limit not works)

* Update licenses (#195)

Pivotal became a part of VMware in 2020.

* Use released gpdb instead of release candidate (#201)

By the gcs resource regex rule, 6.99.99 will be matched as the most
recent gpdb release candidate, which is not what we want. It is a
testing version from release team. And due to the go regex
implementation, (!?) is not supported to filter out an exact string.

So the release pipeline is changed to use the most recent published gpdb
binary instead.

Compared with the previous gpdb_bin, the published one has
'--enable-debug-extensions' as the configure parameter. That also enables
the isolation2 tests for us. So no need to disable those tests in release build
anymore.

* Fix tablespace per segment ratio (#199)

If one tablespace segratio is configured,
this commit will make sure that ethier
the already existed (schema/role)_tablespace_quota
or new configs after this, will have the
same segratio.

1) Add a new quota type: TABLESPACE_QUOTA
used to store the tablespace segratio value
in quota_config table.
When set_per_segment_quota("xx","1.0) is
called, a new config will be added like:
quota_type = 4 (TABLESPACE_QUOTA)
quota = 0 (invalid quota)
segratio = 1

2) Modify set_quota_config_internal to
support insert/delete/update new type config

3) Add function get_per_segment_ratio
it is called when insert new
(schema/role)_tablespace_quota.
"for share" is used to query segratio,it only
can keep the segratio is consistent in
"read committed" level.

Co-authored-by: Sasasu <[email protected]>

* Fix un-expected remove when two tablespace-quota has a same role or schema (#202)

When there are two namespace_tablespace_quota configurations with the
same schema but different tablespaces, removing one quota will remove
another one automatically. The same bug happens for the
role_tabelspace_quota case. The root cause of this bug is that we
can't distinguish these two quotas using only quotaType and targetOid
of diskquota.quota_config.

This PR fixes this by changing the meaning of the targetOid column of
diskquota.quota_config table. When quotaType is
role_quota/schema_quota, targetOid refers to role_oid/schema_oid. When
quotaType is role_tablespace_quota/namespace_tablespace_quota,
targetOid refers to the rowId column of diskquota.target. This will
make it possible to distinguish two namespace_tablespace_quota with
the same schema and different tablespace. Because their rowId will be
different.

Co-authored-by: Xuebin Su <[email protected]>
Co-authored-by: Xiaoran Wang <[email protected]>

* Wait next loop for bgworker to refresh before INSERT in testcase (#205)

`diskquota.naptime` is 2 by default in the release version. So it's better to wait for the bgworker to fresh blackmap after INSERT.

* Add UDF diskquota.show_segment_ratio_quota_view (#204)

* clean blackmap after drop diskquota (#196)

clean blackmap after drop diskquota

Fix hardlimit exceed after drop diskquota.

fix this scenario:
  - create hard limit
  - blackmap on segment was updated
  - drop diskquota
  - the blackmap on segment is not cleanup

use ObjectAccess hook on segment. when dropping diskquota. clean up the in
memory blackmap.

* Reset memory account to fix memory leak (#203)

Diskquota workers dispatch queries periodically. For each dispatch,
one or more memory accounts will be created to keep track of the
memory usage. These accounts should be reset after the query
finishes to avoid memory leak.

Part of the stack trace of account creation looks like this:

```
   fun:CreateMemoryAccountImpl
   fun:MemoryAccounting_CreateAccount
   fun:serializeNode
   fun:cdbdisp_buildPlanQueryParms
   fun:cdbdisp_dispatchX
   fun:CdbDispatchPlan
   fun:standard_ExecutorStart
   fun:_SPI_pquery.constprop.11
   fun:_SPI_execute_plan
   fun:SPI_execute
   fun:do_load_quotas
   fun:load_quotas
   fun:refresh_disk_quota_model
   fun:disk_quota_worker_main
   fun:StartBackgroundWorker
   fun:do_start_bgworker
   fun:maybe_start_bgworker
```

This patch fixes the memory leak issue by calling the reset function 
at the end of the worker loop.

* fix test case for test_manytable (#197)

rename manytable test to activetable_limit

activetable_limit is used to test if activetable_limit works.

if number of current activetable > max_active_tables. bgworker will
print some warring message. GPDB should not crash, but diskquota's basic
function does not guarantee.

* some fix for release CI (#206)

clean blackmap after drop diskquota. follow up #196

also fix test case activetable_limit failed when naptime > 0

* Fix mistaken log statement (#207)

And add contraries that the refresh_black_map() should not be executed
on QD.

* Fix flaky ctas_pause on release build (#210)

Caused by the longger naptime for release build.

* Reformat quota exceeded message (#209)

Co-authored-by: Sasasu <[email protected]>

* Fix change tablespace test case (#213)

* Fix change tablespace test case

"ALTER TABLE xx SET TABLESPACE" will change the relfilenode of
a table, it will copy files from old relfilenode directory to
the new one and delete all files when commiting the transaction.

As diskquota doesn't acquire a lock on the relation when
fetching table size, so if diskquota is collecting active
tables' size and another session is deleting an active table's
files under an old relfilenode directory in a changing
tablespace transaction, the diskquota  can't get all segment's
table size as files on part of segments have been deleted.

To fix it, we try to make a table to be an active table by
inserting data into it and wait for the diskquota to recalculate
the table size after changing tablespace.

* Replace "black" with "reject" for clearity (#214)

* Replace "black" with "reject" for clearity

* Rename in sql files

* Rename test files

* Format .c files

* Re-format using v13

* Update expected test results

* Fix result of upgrade test

* Re-trigger CI

* Add newline at EOF

* Change "list" to "map" for consistency

* Reject set quota for system owner(#215)

* fix set quota with upper case object name

the old code transform all object name to lower case on purpose.

    create schema "S1";
    select * from diskquota.set_schema_quota("S1", '1MB');
    ERROR:  schema "s1" does not exist

if the object name wrapped with '"'. eg '"Foo"' will search 'Foo'
if not will always search the lower case 'foo'.

* Hint message when launcher cannot be connected (#216)

* SPI & entry clear in refresh_blackmap (#211)

- refresh_blackmap doesn't need SPI calls.
- Clearing should happen within the same lock of refreshing entries.
  Since there will be a short time that other processes see an empty
  rejectmap while checking the hard limit.

* Minimize the risk of stale relfilenodes (#217)

Since we do NOT take any lock when accessing info of a relation,
cache invalidation messages may not be delievered in time each time
we SeearchSysCache() for relfilenode. As a result, the relfilenode may
be stale and the table size may be incorrect.

This patch fixes this issue by

- Checking for invalidation messages each time before
SearchSysCache(RELOID).
- Not removing a relation from active table map if the relfilenode
is stale.

Besides, since we do not remove the relfilenode if it maps to no
valid relation oid, we need remove those dangling relfilenodes
when unlink() to avoid memory leak.

* Report error if diskquota is not ready (#218)

Previously, setting quotas when diskquota is not ready is allowed.
But in that case, quotas set will not take effect. This can confuse
the user.

This patch fixes this issue by reporting an error when setting quotas
if diskquota is not ready. This is done by checking diskquota.state.

This patch also fixes possible deadlocks in the existing test cases.

* Allow deleting quota for super user (#219)

* Release pipeline is publishing wrong intermediates (#221)

"get" without "passed" will retrieve the latest content from a resource
which is not expected release. Especially the release and dev pipelines
share the intermediates bucket.

- Move release intermediates to its own bucket
- Add passed the "get" step in the "exit_release" job, to make sure the
  resource version generated from previous job will be consumed in this
  job.

* Bump version to 2.0.1 (#222)

* Compatible with ancient git (#223)

git 1.7.1 dose not support `git diff /absolute/path/to/current/project`

make our CI happy.

* Fix compiler version on concourse (#225)

cmake will look up c compiler by checking /usr/bin/cc. On CentOS7, that
symbolic linked to the gcc in the same directory. Although a newer gcc
is installed, cmake will not use the newer one.

We always export the CC and CXX, then the gcc/g++ in the $PATH will be
used.

* [CI] Fetch secrets from vault. (#226)

This patch teaches concourse to fetch secrets from vault.

* [cmake] Use C compiler to link the target. (#230)

Since we didn't implement anything in C++, let's invoke C compiler to link the target.

See: https://cmake.org/cmake/help/latest/prop_tgt/LINKER_LANGUAGE.html#prop_tgt:LINKER_LANGUAGE

* Scripts for new pipeline naming rules (#231)

Concourse is deprecating some special characters in the pipeline name.
See details at https://concourse-ci.org/config-basics.html#schema.identifier

The pipelines will be named as:
<rel|pr|merge|dev>[_test].<project_name>[branch_name][.user_defined_postfix]

- Pipeline type 'release' is renamed to 'rel'
- Pipeline type 'commit' is renamed to 'merge'
- '_test' suffix will be added to pipeline type for pipeline debugging
- Instead of setting the whole name of pipeline, user can only set the
  postfix.

Patch from https://github.com/pivotal/ip4r/pull/2

Co-authored-by: Chen Mulong <[email protected]>
Co-authored-by: Xing Guo <[email protected]>

* Log readiness message only once (#229)

Currently a diskquota worker will log an error message when the
database is not ready to be mornitored by Diskquota once for each
iteration. When the naptime is small, the log will be flooded by this
error message.

This patch adds a flag in the worker map in the shared  memory so
that the error message will only be logged once per startup.

* Fix flaky test due to slow worker startup (#232)

When creating the extension, a diskquota worker will be started
to mornitor the db. However, it might take longer than expected
until it checks whether it is ready to monitor, which will make
some tests flaky.

This patch increases the time to wait for worker startup to avoid
flaky tests.

* Downgrade severity of the readiness issue (#235)

Downgrade the severity of the readiness check failure from `ERROR` to
`WARNING` so that the error stack will not be logged.

* Fix creating artifact (#234)

The artifact was tared without "c", which is a tar only but not gzipped.
untar it with "xzvf" will report error:

"
tar  xzvf diskquota_rhel7_gpdb6.tar.gz

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
"

* Dynamic bgworker for diskquota extension

As the num of bgworkers is limited in gpdb, if we want to support
unlimited numbers of database to run diskquota extension , we 
need to dynamically allocate bgworkers for databases. Each 
bgworker only runs "refresh_disk_quota_model" once, then exits.

GUC diskquota.max_workers is added in this pr, to set the max
bgworkers running in parallel.

Const MAX_NUM_MONITORED_DB, the databases' num is still 
limited as we store each database's status in shared memory.

When there are fewer databases running diskquota extension than
diskquota.max_workers, will start one bgworker for each database
statically. Otherwise, will start bgworkers dynamically.

Launcher:
* workers: in shared memory
freeWorkers,runningWorkers are 2 double linked list.
Move a worker from freeWorkers to runningWorkers when picking
up a worker to serve for a database.
Move a worker from runningWorkers to freeWorkers when the
process exits. The length of freeWorkers plus runningWorkers is 
diskquota.max_workers.
And each worker has a workerId, referenced by DiskquotaDBEntry.

* dbArray: an array in shared memory too, the length of it is
MAX_NUM_MONITORED_DB, storing DiskquotaDBEntry info.

The related things to dbArray:
- curDB: an iterator pointer for DBArray. When curDB is
NULL means that diskquota has finished one loop, and is ready
for the next loop. When curDB is dbArrayTail, it means it finished 
on the loop but needs to sleep; otherwise, it points to a DB needed
to run.
- next_db(): function to pick the next database to run.

* DiskquotaDBEntry:
- in_use: it holds a diskquota database info.
- workerId: if the workerId is not Invalid, it means the db is running.
To check if a db is running by this field.
If the naptime is short, and after one loop, the database is still
running,  then it will not be picked up to run.
- inited: check If the things  for the diskquota extension, especially
the items in shared memory, have been inited.

* Diskquota worker:
Each diskquota extension worker has something in shared memory,
inited by function init_disk_quota_model,.

* Schedule
If it is ok to start workers, then pick one db in dbArray from beginning
to end to run. Otherwise, it means it's not time to start workers or
there are no available free workers. For the first reason, just sleep
until timeout. For the second point,  to wait for the SIGUSR1 signal
which means there are diskquota workers exiting and workers freed.

* Fix the test failure on release pipeline

Regress tests failed on the release
pipeline due to the diskquota.naptime
is not 0. As diskquota launcher
firstly sleep, then starts diskquota
workers. When naptime is not 0, workers
are started very late, then some tests
which checking them failed.

In this pr, fix the test by starting
diskquota workers in the launcher before
it sleeps. This makes diskquota workers
will be started immediately when the 
launcher starts.

Also, increased the sleep time in
the tests.

Sometimes, after restarting gpdb cluster, the db
has not yet been added to monitored_db_cache,
but worker_get_epoch is called to fetch the db
from the cache, then it will fail. Shouldn't to
report error here, printing a warning is ok.

* Move pr/merge pipeline to dev2 (#238)

And the docker clang-format image exists on the extensions gcp, I am too
lazy to move it here and it will be unnecessary to do that since we have
the check as github actions.

* Combine two object_access_hook into one (#220)

* Remove free writer gang in launcher (#233)

As launcher and worker both run in bgworker, they are clients when
connecting to segments. They know when should close the connections.
So we directly call `DisconnectAndDestroyAllGangs` to close them and
remove the code that only closes free read gangs based on timeout.

For the launcher, we destroy gangs after init_datbase_list or after each
time processing create/drop diskquota extension.

For workers, we only destroy gangs after the extension is paused.

* Fix bug: diskquota quota usage doesn't include uncommitted table (#228)

* Fix bug: diskquota quota usage doesn't include uncommitted table
Modify the show_fast_schema_quota_view, show_fast_role_quota_view, show_fast_schema_tablespace_quota_view and show_fast_role_tablespace_quota_view. Combine pg_class with relation_cache into all_relation, and join all_relation with diskquota.table_size to calculate quota usage.

Add upgrade and downgrade procedure between diskquota2.0 and diskquota2.1. Add upgrade and downgrade test for diskquota-2.1.

* Fix flaky test (#240)

* Fix flaky test test_ctas_role (#242)

creating table runs too fast to trigger the hard quota limit.

* Support at least 20000 tables for each database (#239)

Currently, diskquota supports max to 8000 tables, including partition tables. It may cause unexpected results if there are too many tables in the customer cluster.

This PR change `init_size` of table_size_map to 20000, instead of MAX_TABLES, so that we can
economize the actual usage of SHM. To handle scenarios with a large number of tables,
we set `max_size` to MAX_TABLES (1000000).

And To avoid memory waste, only the master should be allocated shared memory for diskquota_launcher and diskquota_worker.

* Reduce MAX_TABLES to 200K. (#243)

The scale of 50 databases with 200K tables is large enough, so we
reduce MAX_TABLES to 200K to avoid memory waste.

* Fix test_rejectmap flaky test (#246)

Why the test case can run successfully before, but sometimes fails?
Because it adds the reject entry after starting inserting data into the
table. If the inserting has hang up on `check_rejectmap_by_relfilenode`
and then adding the reject entry, hardlimit will be triggered. Otherwise,
softlimit will be triggered.

* Add SECURITY.md

* Add SECURITY.md (#249)

* Rewrite the diskquota worker scheduler (#245)

Rewrite the diskquota worker scheduler. Each DB entry manages the
running time by itself to make the scheduler more simple.
I have tested the performance, nearly the same as before.

Add a UDF db_status to help debug or monitor the diskquota, related
SQL is in test_util.sql. I didn't put it into diskquota--2.1.sql, because I
don't want users to use it now. I didn't use any shared lock when getting
that status info, so they may be not consistent.

Co-authored-by: Xing Guo <[email protected]>
Co-authored-by: Zhang Hao <[email protected]>

* Remove PR pipeline base branch (#252)

So all PRs can trigger build.

* Fix bug: relation_cache cannot isolate table info for each database (#241)

Previously, users will see the information of tables in db2 when accessing db1 and executing select diskquota.show_relation_cache(). Meanwhile, if db1's bgworker can see the table in db2, the performance will be influenced, because the table size in db2 will be calculated although it is not useful for db1 diskquota.

This PR has added a judgment to prevent this situation.

* Fix flaky test (#250)

When diskquota is in dynamic mode, it may take a long time to start a worker for a database (because the worker is busy), so some tests will fail after pg_sleep.

Add the diskquota_test extension to resolve the issue. In it, there is a UDF diskquota_test.wait(SQL test), which we can run by passing any SQL whose select result is true or false to it, it runs the SQL and waits for the selection result to be true at most 10 seconds. diskquota_test.db_status will return all the databases which are monitored by diskquota and their status and epoch. diskquota_test.cur_db_status() and diskquota_test.db_status() are just wrappers of it, which are more useful in the tests.

In the regress tests, replace the sleep with diskquota_test.wait or diskquota.wait_for_worker_new_epoch().

What's more, I have extracted the monitored_dbid_cache related data and functions into the monitored_db.c file.
Disable the regress test test_init_table_size_table, there is a bug making it fail, I will fix it in another pr.

* Filter the useless relation type. (#254)

Co-authored-by: Xiaoran Wang <[email protected]>

* Fix worker time out and diskquota.max_workers (#255)

* Fix flaky "database not found" (#256)

* Change the max value of diskquota.max_workers to 20 (#257)

* Change the max value of diskquota.max_workers to 20

If we set the diskquota.max_workers max value to be
max_worker_processes, when max_worker_processes is less than 10 and we
set diskquota.max_workers value more than max_worker_processes, the
cluster will crash.

Set the max value to be 20, when the max_worker_processes is less than the
diskquota.max_workers, diskquota can work, only some parts of the
databases can not be monitored as diskquota can not start bgworkers for
them.

* Modify diskquota worker schedule test

Test when diskquota.max_workers more than available bgworker.

Co-authored-by: Zhang Hao <[email protected]>

* Revert change of worker_schedule test becuase flaky test (#260)

* Missing pause causes deadlock flaky (#258)

* Fix memory leak when database is not ready (#262)

Co-authored-by: Zhang Hao <[email protected]>

* Change the default value of GUC to reduce default memory cost. (#266)

Change the default vaule of `diskquota.max_active_tables` from 1M
to 300K, the memory usage relevant it is reduced from 300MB to 90MB.

* Refactor TableSizeEntry to reduce memory usage. (#264)

Refactor the structure of TableSizeEntry to reduce memory usage.

Previously, the size of each table in each segment should be maintained in TableSizeEntry, which wastes lots of memory. In this PR, we refactor the TableSizeEntry to:

struct TableSizeEntry
{
	Oid    reloid;
	int    segid;
	Oid    tablespaceoid;
	Oid    namespaceoid;
	Oid    owneroid;
	uint32 flag;
	int64 totalsize[SEGMENT_SIZE_ARRAY_LENGTH];
};
In this way, we can maintain multiple sizes in one TableSizeEntry and efficiently save memory usage.

For 50 segments: reduced by 65%.
For 100 segments: reduced by 82.5%.
For 101 segments: reduced by 65.3%.
For 1000 segments: reduced by 82.5%.

* Correct table_size_entry key (#268)

There is a bug: removing TableSizeEntry from table_size_map by oid.
Actually, the hash map key is TableKeyEntry. Fix it.

* Add cmake opt DISKQUOTA_DDL_CHANGE_CHECK (#270)

* Fix regression caused by #264 (#272)

The #264 caused some segment ratio tests fail. The entry's relevant
fields need to be set at the end of the iteration. Otherwise, only the
first seg will pass the condition check.

* Add a GUC `diskquota.max_table_segments`. (#271)

Use diskquota.max_table_segments to define the max number of table segments in
the cluster. The value equal (segment_number + 1) * max_table_number.

Since hashmap in the shared memory can take over others' memory space
even when it exceeds the limit, a counter is added to count how many tables
have been added to the table_size_map, to prevent too many entries to be
created.

Co-authored-by: Xiaoran Wang <[email protected]>
Co-authored-by: Chen Mulong <[email protected]>

* Enable tests (#274)

* Optimize dispatching reject map to segments (#275)

Avoid dispatching reject map to segments when it is not changed

* Fix bug: rejectmap entries should not be removed by other databases. (#279)

This commit fixed two bugs:
- Previously, refresh_rejectmap() cleared all entries in rejectmap, including other databases' entries, which causes hardlimit can not to work correctly.
- soft-limit rejectmap entries should not be added into disk_quota_reject_map on segments, otherwise, these entries may remain in segments and trigger the soft-limit incorrectly.

Co-authored-by: Chen Mulong <[email protected]>

* Fix diskquota worker schedule bug (#280)

When a database's diskquota bgworker is killed and the db is dropped,
diskquota scheduler can not work properly. The cause is: if the scheduler
failed to start a bgworker for a database, it will try it again and
again forever.

A different status code is returned when failing to start bg worker. And if it is
failed due to the dropped database (or another other reasons causes
db name cannot be retrieved from db id), just skip this bgwoker for now
For other failure reasons, limit the times of starting a bgworker for a database
to 3 times. If the limit is reached, skip it and pick the next one.

* Bump version to 2.1.1 (#283)

* Fix flaky test_fast_quota_view (#282)

- Drop the table space before rm directory.
- '-f' to alway force rm.
- '-- start-ignore' doesn't seem to be working with retcode since retcode
  will add '-- start/stop-ignore' pair automatically to ignore the
  output, and the nested start/stop ignore doesn't seem to be handled
  well by the ancient perl script. Refer to
  'src/test/isolation2/sql_isolation_testcase.py'.

Seen flaky tests as below:

root@96831b9f-9150-4424-63a7-abe8f18c144e:/tmp# cat /home/gpadmin/diskquota_artifacts/tests/isolation2/regression.diffs
--- \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/expected\/test_fast_quota_view\.out	2022-12-12 13:20:56.729354016 +0000
+++ \/tmp\/build\/4eceba44\/bin_diskquota\/tests\/isolation2\/results\/test_fast_quota_view\.out	2022-12-12 13:20:56.733354401 +0000
@@ -175,9 +175,11 @@
 (exited with code 0)
 !\retcode rm -r /tmp/spc2;
 GP_IGNORE:-- start_ignore
+GP_IGNORE:rm: cannot remove '/tmp/spc2/6/GPDB_6_301908232/16384/16413': No such file or directory
+GP_IGNORE:rm: cannot remove '/tmp/spc2/5/GPDB_6_301908232/16384/16413': No such file or directory
 GP_IGNORE:
 GP_IGNORE:-- end_ignore
-(exited with code 0)
+(exited with code 1)
 -- end_ignore
 DROP TABLESPACE IF EXISTS spc1;
 DROP

* Fix flaky isolation2 test. (#281)

Currently, isolation2/test_rejectmap.sql is flaky if we run isolation2
test multiple times. That's because we set the GUC
'diskquota.hard_limit' to 'on' in test_postmaster_restart.sql and forget
to set it to 'off'. In the next following runs, the hard limit is
enabled and the QD will continuously dispatch reject map to segment
servers. However, test_rejectmap.sql requires the hard limit being
disabled because we're dispatching rejectmap by UDF manually or the
dispatched rejectmap will be cleared by QD.

This patch adds a new injection point to prevent QD from dispatching
rejectmap to make test_rejectmap.sql stateless. This patch also set
'diskquota.hard_limit' to 'off' when test_postmaster_restart.sql
finishes.

* Fix compilation to support gpdb7 (#285)

* Fix diskquota on gpdb7

- Fix some compile issues, especially relstorage has been removed
on gpdb7. Using relam to get the relation's storage type.

- Modify diskquota hash function flag.

- Fix diskquota_relation_open(). NoLock is disabled on gpdb7.

- Add tests schedule and expected results for gpdb7.

- Update some test expectations on gpdb7 due to AO/CO issue: As
something changes about AO/CO table, the size of them is changed.

- Disable some tests on gpdb7.

- Disable upgrade tests.

- Upgrade to diskquota 2.2. Add attribute relam to type
relation_cache_detail and add a param to function relation_size_local.

- Add setup.sql and setup.out for isolation2 test.

- Fix bug: gpstart timeout for gpdb7. We used to set
`Gp_role = GP_ROLE_DISPATCH` in disk_quota_launcher_main(), even
though postmaster boots in utility mode. This seems to be nothing 
in gpdb6, but it will cause a dead loop when booting gpdb7. In fact,
there is nothing to do in utility mode for the diskquota launcher. In 
this commit, if `Gp_role != GP_ROLE_DISPATCH`,
disk_quota_launcher_main() will simply exit.

- Add gpdb7 pipeline support. Build gpdb7 by rocky8, and test the 
same build with rocky8 and rhel8. 'res_test_images' has been changed
to list to support this.

- Add gpdb version into the task name. 'passwd' is unnecessary and '
doesn't exist in the rocky8 build image.

- Use `cmake -DENABLE_UPGRADE_TEST=OFF` to disable the upgrade test.

- TODO: Add upgrade test to CI pipeline. Fix activate standby error on the
CI pipeline. Fix tests for gpdb7.


Co-authored-by: Xiaoran Wang <[email protected]>
Co-authored-by: Xing Guo <[email protected]>

* Fix released tarball name. (#286)

Co-authored-by: Hao Zhang <[email protected]>

* Add an option to control whether compile with fault injector. (#287)

Usage:

```
cmake -DDISKQUOTA_FAULT_INJECTOR=ON/OFF [default: OFF]
```

Co-authored-by: Hao Zhang <[email protected]>

* Add judgement for fault injector. (#288)

If fault injector is disabled, isolation2 will be disabled.

* Revert "Add judgement for fault injector. (#288)" (#291)

This reverts commit e3e73d2dd24ef0878a6a41b8734e340f5fb7a96a.

Revert "Add an option to control whether compile with fault injector. (#287)"

This reverts commit ae4ab48c97c5e77ecf25bee3ebf58ccd1244429f.

Co-authored-by: Hao Zhang <[email protected]>

* CI: Fix pipeline (#293)

- Switch GPDB binary to release-candidate for release build.
- Remove test_task from the release pipeline.

Co-authored-by: Xing Guo <[email protected]>

* Format code by clang-format. (#296)

* Add command to compile isolation2. (#297)

Isolation2 compilation command is removed by #285. We add it into
Regress.cmake in this commit.

Co-authored-by: Xing Guo [email protected]

* Fix flaky test (#294)

- Fix flaky test test_ctas_before_set_quota. pg_type will be an active table
after `CREATE TABLE`. It does not affect the function of diskquota but
makes the test results unstable. In fact, we do not care about the table
size of the system catalog table. So we simply skip the active table oid
of these tables.

- Fix test_vacuum/test_truncate. gp_wait_until_triggered_fault
should be called after gp_inject_fault_infinite with suspend flag.

Co-authored-by: Xing Guo <[email protected]>
Co-authored-by: Xiaoran Wang <[email protected]>

* Fix flaky test test_rejectmap_mul_db (#295)

When creating a new table, pg_type will be in active tables.
Filter the system catalog table. And remove pause in the test.

* Fix bug (#298)

Fix the following bugs:
- Judgement condition for update_relation_cache should be `&&`, instead of `||`
- The lock for relation_open/relation_close should be AccessShareLock for gpdb7.
- For `truncate table`, we cannot get the table's oid by new relfilenode
immediately after `file_create_hook` is finished. So we should keep
the relfilenode in active_table_file_map and wait for the next loop to
calculate the correct size for this table.

* Enable upgrade test for CI (#299)

Revert some modification from #285.
- Add last_released_diskquota_bin back for CI.
- Enable upgradecheck.
- Add -DDISKQUOTA_LAST_RELEASE_PATH for cmake.

* Skip fault injector case for release build (#302)

Due to the release build change for GP7, the fault injector doesn't work
with the release build. So, all the tests were temporally disabled for
release pipelines.

Since we switched to use the `--disable-debug-extensions` gpdb build, the
fault injector is not available for the release pipeline.

- Add 'EXCLUDE_FAULT_INJECT_TEST' to Regress.cmake, so it will be smart
  enough to check if there are any fault injector case in the give tests
  set. Ignore them if so.
- Skip the fault injector tests for the release pipeline.
- Enable the CI test task for GP7.

* VAR replace for Regress.cmake and fix test_rejectmap (#304)

Since GP7 doesn't support plpythonu, the test_rejectmap was broken for
GP7. This commit:

- Improve the Regress.cmake, so if the input sql file has a "in.sql"
  extension, "@VAR@" in it will be replaced by the corresponding cmake
  VAR.
- SQL_DIR/EXPECTED_DIR takes list as the argument now. So only the
  different cases need to be put into the expected7. Others will be used
  from expected directly.
- Due to above change, same tests for gp6 and gp7 are removed from
  gp7, only diff is needed.
- Set different @PLPYTHON_LANG_STR@ for GP6 & GP7
- Due to plpython composite type behavior change, the python code in
  the test has been modified. The behavior change is probably related
  to PG commit 94aceed317.

* Fix update test failures caused by segid diff (#305)

Due the GPDB change b80e969844, upgrade_test failed because the segid
in the view dump is surrounded by quotes.

Expected failure diff:

-WHERE (table_size.segid = (-1)))) AS dbsize;
+WHERE (table_size.segid = '-1'::integer))) AS dbsize;

* Replace relation_open/relation_close with RelationIdGetRelation/RelationClose. (#300)

This PR replaces relation_open()/relation_close() with RelationIdGetRelation()/RelationClose() to avoid deadlock. We can only call relation_open() with AccessSharedLock in object_access_hook(OAT_POST_CREATE) in GPDB7, which may cause deadlock. While RelationIdGetRelation()/RelationClose() just locks pg_class instead of user-defined relation, so we use these functions to get the information of relations.

* Fix flaky test of test_rejectmap_mul_db (#301)

Add back the pause in the test, to avoid the changes of active table.

* Fix test_primary_failure test case (#303)

* subprocess.check_output of python3, needs encoding
* Add @PLPYTHON_LANG_STR@ in the test_primary_failure.in.sql

* Move files to src and control dirs. (#307)

- Move .c and .h files to src dir.
- Move diskquota ddl file to control/ddl dir.
- Move diskquota_test--1.0.sql file to control/test dir.

* Fix gpdb release binary regex (#311)

The release candidate contains the build number now. Change the regex to
match the latest release candidate for release pipeline.

* Enable test_postmaster_restart (#309)

The isolation2 test `test_postmaster_restart` is disabled because the
postmaster start command is different between GPDB6 and GPDB7. We
should enable this test by passing distinct commands for GPDB6 and
 GPDB7 to test_postmaster_restart.sql.

Meanwhile, we can merge isolation2_schedule7 and isolation2_schedule
 to one.

* Bump cmake min version to 3.20 (#313)

Fix #312

* Resource change for gpdb7 binary (#315)

"""
NOTICE:  resource queue required -- using default resource queue "pg_default"
"""
has been removed in gpdb7 82851a0b85 . Ignore it in the tests.

* Fix bug: bgworker enters infinite loop after receiving notices from QE. (#314)

We should not initialize the MyProcPort in bgworker, which will cause the database name to not be automatically added to the log message. In normal, the log printed from the bgworker does not contain the database name but contains the bgworker's pid. In order to facilitate debugging, we should print the database name every BGWORKER_LOG_TIME to ensure that we can find the database name by the bgworker's pid in the log file.

Co-authored-by: Chen Mulong <[email protected]>

* Ignore some hint/notice for GPDB7 (#328)

* fix flaky test: test_appendonly (#327)

* Add dbname to DiskQuotaWorkerEntry (#326)

Previously, we passed dbname by bgw_name to bgworker. But refer to https://github.com/greenplum-db/gpdb/blob/f182228992b62e2023e2fac5b4971406abd35c9d/src/backend/postmaster/bgworker.c#L385-L386, bgw_name is copied by ascii_safe_strlcpy(), which replaces non-ASCII bytes with '?'. If the database name is non-ASCII, the bgworker can not connect to the correct database and raises an error. issue: #323 . To solve this problem, we should pass dbname by bgw_main_arg to the main function of the bgworker.

* Replace SPI_execute with SPI_cursor_fetch (#322)

The result buffer of SPI_execute is limited to 1GB. If the number of rows in diskquota.table_size exceeds 3 * 10^7 (500 segments with 60000 tables for example), SPI_execute("select tableid, size, segid from diskquota.table_size") will raise an error for invalid memory alloc request size 1073741824 (context 'SPI TupTable').  We should fetch table-size entries by portions. Replace SPI_execute() with SPI_cursor_fetch() in load_table_size().

* Split update SQL statement for diskquota.table_size (#325)

Fix issue: #318 

If the number of entries needed to be updated in diskquota.table_size is too many, the size of StringBuffer will exceed 1GB and raise an error `Cannot enlarge string buffer containing 1073741807 bytes by 20 more bytes.`.

We should limit the number of entries in the update statement to 1000000 every time to ensure the string buffer size does not exceed 1GB.

* Update cmake version in README. (#331)

* Fix flaky test: test_appendonly (#333)

* Skip refresh_disk_quota_model() when receiving a signal (#321)

When with often signals to receive diskquota may call refresh_disk_quota_model 
more frequently than diskquota_naptime.

Solution: when receiving a signal, the time waiting for latch sleep_time will be smaller
than diskquota_naptime. We should skip refresh_disk_quota_model() when
sleep_time < diskquota_naptime.

* Allocate more shared memory to quota_info (#334)

Allocate more SHM to quota_info, so that the user can set more quota
configurations.

* Enable stanby test for 6X_STABLE (#335)

* Bump version to 2.2.1 (#336)

* Use install_gpdb_component in CI test (#337)

So the script can be tested as well.

* Fix compiler warnings. (#339)

Note: `TextDatumGetCString()` returns a null-terminated CString.

```
[ 12%] Building C object CMakeFiles/diskquota.dir/src/diskquota.c.o
In file included from /home/v/workspace/diskquota/src/diskquota.h:16,
                 from /home/v/workspace/diskquota/src/diskquota.c:18:
/home/v/workspace/diskquota/src/diskquota.c: In function ‘diskquota_status_schema_version’:
/home/v/.local/gpdb7/include/postgresql/server/c.h:957:25: warning: ‘strncpy’ specified bound 64 equals destinatio
n size [-Wstringop-truncation]
  957 |                         strncpy(_dst, (src), _len); \
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/v/workspace/diskquota/src/diskquota.c:1548:9: note: in expansion of macro ‘StrNCpy’
 1548 |         StrNCpy(version, vv, sizeof(version));
      |         ^~~~~~~
```

Co-authored-by: Hao Zhang <[email protected]>

* Fix bug: diskquota.status() (#344)

Fix bug produced by #339 

Co-authored-by: Xing Guo [email protected]

* fix pipeline (#345)

fix pipeline resource

* Reduce the number of the log in bgworker. (#346)

* Fix bug: lose monitored_dbid_cache after switching mirror (#342)

# Problem
Recording active tables and permission checking on segments rely on `monitored_dbid_cache`. But after mirror switching, the data in shared memory is lost, and the above behaviors will be blocked.

# Solution
Segments update their `monitored_dbid_cache` after receiving pull_active_tables_oid requests every naptime.

* Enable continuous upgrade. (#340)

To achieve upgrading directly from 2.0 to 2.2, we should do the following things:
- cherry-pick this PR in diskquota-2.0, diskquota-2.1 and diskquota-2.2.
- set the shared_preload_libraries as `diskquota-2.2`: `gpconfig -c shared_preload_libraries -v 'diskquota-2.2'`
- restart cluster: `gpstop -ar`
- execute the following SQLs:
```
ALTER extension diskquota update to '2.2';
```

* Fix pipeline. (#349)

Get the latest package version for release resource.

* Fix upgrade version check (#347)

- Because of 2a73847f1a, the sql file directory has been changed. The
  sql version check wouldn't work since it cannot find the sql files
  anymore. Change it to the correct ddl directory.
- 'exec_program' is deprecated, use 'execute_process' instead.
- 'git tag | sort' returns the latest version among all branches, but
  not the closest tag to the current commit. Use 'git describe --tags'
  instead. So the upgrade version check will work for the 2.1.x patch
  release.

* Remove gp7 from pipeline (#350)

After creating extension, the `diskquota.state` is always `unready` due to the change https://github.com/greenplum-db/gpdb/pull/15239. It makes the test timeout. We disable the job of gp7 in release/pr/merge pipeline until the problem is fixed.

* Add alter extension upgrade test (#348)

- By #340, diskquota should be able to upgraded directly from any previous
  version. A script is added to test this.
- Modify the cmakefile so before installing/packaing, only previous so
  files will copied. This would help us to make patch release for
  2.0/2.1.

---------

Co-authored-by: Zhang Hao <[email protected]>

* Add a sleep in alter_test.sh (#351)

Co-authored-by: Chen Mulong [email protected]

* Update to 2.2.2 (#352)

- bump version to 2.2.2
- Modify the check procedure of alter extension.

* Reduce the remain logs in bgworker. (#354)

Fix from #346

* Fix bug: bgworkers only print log once. (#356)

* Disable forks for PR resource (#358)

* Release/gp7 2.2.2 (#361)

Change the CI pipeline to release diskquota for GP7.

TODO:
- enable regress test on CI.
- enable activatestandby test for GP7 on CI.
- fix regress test for GP7. The view in GP7 will be treated as a relation by diskquota. After `CREATE EXTENSION`, the test case should execute `SELECT diskquota.init_table_size_table()` to make the diskquota.state ready on GP7.

* Update to 2.3.0 (#362)

Fix the following issues:

diskquota cannot correctly print a warning when calling
`create extension` in a non-empty database. The diskquota.state shows
whether the current database is empty. Previously, we updated the
status to diskquota.state during `create extension` and queried the
status in UDF `diskquota.diskquota_start_worker`. When querying the
database status, the relations whose `relkind in ('v', 'c', 'f')` are
skipped, while all relations are filtered when updating the database
status. This patch merges the two SQL statements to `INSERT RETURNING`
to solve this problem.
Benefit: we won't need to upgrade the minor version of diskquota when
changing the statement.

Remove useless test `test_update_db_cache`. The behavior of the
bgworker scheduler is changed, so this test is no longer helpful.

Update the upgrade test script `2.2_set_quota`. Since the above issue,
ther…
reshke pushed a commit to reshke/cloudberrydb that referenced this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants