You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 7, 2026. It is now read-only.
There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.
Following are the details.
Environment details
Spanner
3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned
OS type and version: MacOs version 10.15.3
Java version: java 8
spanner version(s): the latest version
Steps to reproduce
Write 2 billion records with timestamp in to the spanner table with dataflow
Run a clean job to delete the old 1 billion records with Partitioned DML by java client
Set the timeout to be 24 hours by using setPartitionedDmlTimeout(Duration.ofHours(24L))
Run the java code
Code example
SpannerOptionsoptions = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build();
Spannerspanner = options.getService();
try {
DatabaseIddb = DatabaseId.of(options.getProjectId(), args[0], args[1]);
// [END init_client]//This will return the default project id based on the environment.StringclientProject = spanner.getOptions().getProjectId();
if (!db.getInstanceId().getProject().equals(clientProject)) {
System.err.println(
"Invalid project specified. Project in the database id should match the"
+ "project name set in the environment variable GOOGLE_CLOUD_PROJECT. Expected: "
+ clientProject);
}
// [START init_client]DatabaseClientdbClient = spanner.getDatabaseClient(db);
SimpleDateFormatformatter = newSimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS");
Stringsql="DELETE FROM MyTable WHERE updateTimestamp<'2020-05-02T03:00:00Z'";
DatecurrentDate = newDate();
System.out.println(String.format("Delete Start:%s",formatter.format(currentDate)));
longlStartTime = Instant.now().toEpochMilli();
deleteUsingPartitionedDml(dbClient,sql);
longlEndTime = Instant.now().toEpochMilli();
longelaspedTime = lEndTime - lStartTime;
System.out.println("Elapsed time in : " + elaspedTime/1000);
} finally {
spanner.close();
}
// [END init_client]System.out.println("Closed client");
}
privatestaticvoiddeleteUsingPartitionedDml(DatabaseClientdbClient,Stringsql) {
longrowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
System.out.printf("%d records deleted.\n", rowCount);
}
There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.
Following are the details.
Environment details
3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned
Steps to reproduce
Code example