Compilation of Cursor Loops by Realizing Aggify: Project Milestone Report

Group Information

Haoyu Zhang: haoyuzha

Yuchen Liu: yuchenl6

URL for Project

https://www.haoyu.dev/blog/aggify-home

Major Changes

We decided to change our roadmap, with our first step changed to benchmarking Aggify on TPC-H workloads. We manually crafted the most optimized code for all the Aggify workloads and profiled its performance. We prioritized benchmarking even without automatic code generation because, first, we were familiarizing ourselves with the algorithm by generating the code manually so it would be easier to reproduce it programmatically. Second, the current control flow graph (CFG) still needs to be prepared for us to do data flow analysis. Handcrafting the code was a good decision because after carefully walking through each step in the Aggify paper, we found two roadblocks. First, DuckDB had insufficient support for custom aggregate functions. Secondly, we discovered a fundamental design flaw in Aggify. Since we discovered these two roadblocks early in the project, we have tackled them and are now on track to successfully completing the project on time.

Current Progress

We made the following two major steps that have enabled us to implement Aggify in DuckDB.

Custom Aggregates in DuckDB without Combine()

Combine() methods of custom aggregation functions in DBMSs (database management systems) define the logic to combine the result of aggregating two groups into one. If an aggregation function has a Combine() function then the execution engine can partition a large group into small ones, apply the aggregate to each partition in parallel, and finally combine all the results. Automatically inferring the Combine() function from a cursor loop is out of scope for the Aggify paper (and we suspect is not possible in the general case). Another challenge is that DuckDB forces the user to provide a Combine() function. Therefore, to implement Aggify in DuckDB, we had to modify the query execution to run custom aggregates in a single-threaded fashion, alleviating the need to provide a Combine() function.

Fixing a design flaw in Aggify

Surprisingly, when trying to reproduce Aggify’s experiment result, we found a major design flaw. Take one of the Aggify proposed test cases in Figure 1(a) as an example; the result of applying Aggify to this cursor loop is shown in Figure 1(b). custom_count is the custom aggregation function that behaves the same as native count except that it lacks a Combine() method, so it cannot run in parallel. As expected, Aggify can generate the correct result for this test case in the context. However, after changing the initial value of val to a non-zero number, it is obvious that Aggify’s transformation is incorrect. For example, if the cursor (pink) query returns empty, by running the cursor loop normally and iteratively, the value of val should be unchanged and the UDF should return 100 (shown in Figure 1(c)). However, the custom aggregate will overwrite val to be 0 in (shown in Figure 1(d)), which produces a result that is inconsistent with the original UDF containing a cursor loop.

Figure 1. (b) is the result of Aggify for (a); (d) is the result of Aggify for (c)

We now propose a simple fix to this issue. Since the main problem happens when the cursor query returns empty, it is necessary to insert a precondition before the custom aggregate to ensure the cursor query always returns something. Therefore, the correct transformation for (c) is shown below. This introduces a new query that performs duplicate work. For the next step, we will test whether the query optimizer can use Common Subexpression Elimination to reduce this overhead.

val := 100;
if exists SELECT O_ORDERKEY 
			    FROM orders
			    WHERE O_CUSTKEY = custkey
	val := SELECT custom_count(okey, val) FROM (
				 SELECT O_ORDERKEY 
			   FROM orders
			   WHERE O_CUSTKEY = custkey
			   ) AS tmp;

Benchmark Result

After fixing the problems we outlined above, we compare the performance of Aggify with that of built-in aggregates (which represent the best possible performance). Our experiment uses the TPC-H dataset with a scale factor 10 running on a MacBook with Apple M1 Pro chip and 32GB of RAM.

Figure 2. Performance comparison of Aggify using custom aggregate (workload1) or built-in aggregate (workload2), with multi-threading enabled.

Figure 3. Performance comparison of Aggify using custom aggregate (workload1) or built-in aggregate (workload2), with only one thread used.

As expected, custom aggregates generally do not perform as well as built-in ones, up to 0.18 times slower for workload DiscountRevenue. The performance gap can be explained by the lack of parallelism. Since Aggify generates custom aggregate functions without a Combine() function, they are forced to run sequentially compared to built-in aggregates which can fully exploit the available parallelism. To further support this argument, we make DuckDB run the two workloads again single-threaded. In this case, Aggify shows a similar performance with the built-in aggregate. An interesting, future research direction would be to use more sophisticated program analysis techniques to infer the Combine() function for cursor loops.

Revised Schedule

We are currently at Week 4.

Week No.	Task
1	Haoyu: Manually generate the Aggify transformation result in SQL. Yuchen: Manually generate the C++ code for the Aggify result.
2	Haoyu & Yuchen: Modify DuckDB & think of the Aggify design flaw.
3	Haoyu & Yuchen: Run the experiment & prepare the milestone report.
4	Haoyu & Yuchen: Generate the CFG for PL/pgSQL.
5	Haoyu: Do Data Flow Analysis on the CFG. Yuchen: Generate code from CFG to C++.
6	Haoyu & Yuchen: Prepare presentation, poster and report.

Resources Needed

Hard work.

References

[1] S. Gupta, S. Purandare, and K. Ramachandra, “Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Portland OR USA: ACM, Jun. 2020, pp. 559–573. doi: 10.1145/3318464.3389736.