This project is a application for Multi-Feature Query. If you interest in this concept please check the following link: http://dl.acm.org/citation.cfm?id=673628
The Java program will automatically generate a C program to process the input query;
A example query is like this:
cust prod sum_1_quant count_1_quant
1
cust prod
avg_0_quant sum_1_quant count_1_quant
1_state="NY" and 1_quant>avg_0_quant
That means for each customer and product, output the sum and count of the trading quantity which traded in NY state and the quantity is greater than the average quantity of whole year for this combination of customer and product.
*This MF-query do not have "having condition" so there are only 5 argument in it. *
This function get this query input then store these data into a Query object.
This function get the Query object and connect to PostgreSql database. It use:
select * from information_schema.columns where table_name = ... and column_name = ...
to get the information about all attributes of one table attribute such like "cust", then we can easily get what the type of "cust".
After we finish to process all data in Query we will get all attribute and their type and store them into the MFstructure.
According to the Query information, we can get how many table scan we will do at most depending on the number of grouping variable. We use each loop to work out the C code of each grouping varible, any aggragate functions about one grouping variable can be done by one loop. In the generated C program every grouping varible need one table scan cursor (this is not true if we use topological sort to optimize the program), so we also generate enough cursors in outputFrame() function. For each table scan, we need implement such that condition if needed. Addtional, if we need a global aggregate function for example the sum of all quantity for a combination of cust and prod, we need a group variable 0. We must create a table scan progress for this grouping varible as well. All data should be store in the corralated tuple in MFstructure.
About optimization: In some situation, we can complete several grouping varibles within only one table scan, we call these kind of varible or aggragate function as independent grouping varibles or functions. Before we generate the table scan code, we need to build a directive graph according to the "such that" argument, then use kahn's algorithm to do topological sort and figure out which grouping varibles can be done in one table scan. Without optimization the C program structure is like this:
if(j == *i){
if((strcmp(sale_rec1.state,"NY")==0)){
...
}
}
else{
if((strcmp(sale_rec1.state,"NY")==0)){
...
}
}
If there is a varible want to check another state, say, NJ and it is independent with NY grouping varible, the program structure will become:
if(j == *i){ //the combination of group attributes is not exist in MFstructure
if((strcmp(sale_rec1.state,"NY")==0)){
...
}
else if((strcmp(sale_rec1.state,"NJ")==0)){
...
}
}
else{ //the combination has already been in MFstructure
if((strcmp(sale_rec1.state,"NY")==0)){
...
}
else if((strcmp(sale_rec1.state,"NJ")==0)){
...
}
}
This function will grant the C program the logic to do output and organize the query output format. The basic idea is just scanning the MFstructure because it has all we need. During output each tuple, C program also need to deal with "having condition" to decide which tuple should be print out. So the structure of output logiv is somehow like the table scan process because one needs to tackle "having condition", the other needs to do with "such that".
This function helps C program define its data structure of table scan cursor. According to the number of grouping varible, it also declare enough cursors.
This function helps C program define its MFstructure.
Create main function in C program