By Franck Pachot
NoSQL provides an API that is much simpler than SQL. And one advantage of it is that users cannot exceed a defined amount of resources in one call. You can read this in Alex DeBrie article https://www.alexdebrie.com/posts/dynamodb-no-bad-queries/#relational-queries-are-unbounded which I take as a base for some of my “Myth of NoSQL vs RDBMS” posts because he explains very well how SQL and NoSQL are perceived by the users. But this idea of simpler API to limit what users can do is is quite common, precedes the NoSQL era, and is still valid with some SQL databases. Here I’m demonstrating that some RDBMS provide a powerful API and still can bound what users can do. Oracle Database has a resource manager for a long time, like defining resource limits on a per-service base, and those features are very simple to use in the Oracle Autonomous Database – the managed database in the Oracle Cloud.
I am using the example schema from the ATP database in the free tier, so that anyone can play with this. As usual, what I show on 1 million rows, and one thread, can scale to multiples vCPU and nodes. Once you get the algorithms (execution plan) you know how it scales.
06:36:08 SQL> set echo on serveroutput on time on timing on 06:36:14 SQL> select count(*) ,sum(s.amount_sold),sum(p.prod_list_price) from sh.sales s join sh.products p using(prod_id); COUNT(*) SUM(S.AMOUNT_SOLD) SUM(P.PROD_LIST_PRICE) ___________ _____________________ _________________________ 918843 98205831.21 86564235.57 Elapsed: 00:00:00.092
I have scanned nearly one million rows from the SALES table and joined it to the PRODUCTS table, aggregated data to show the sum from both tables columns. That takes 92 milliseconds here (including network roundtrip). You are not surprised to get fast response with a join because you have read The myth of NoSQL (vs. RDBMS) “joins dont scale” 😉
Ok, now let’s say that a developer never learned about SQL joins and wants to do the same with a simpler scan/query API:
06:36:14 SQL> declare l_count_sales number:=0; l_amount_sold number:=0; l_sum_list_price number:=0; begin -- scan SALES for s in (select * from sh.sales) loop -- query PRODUCTS for p in (select * from sh.products where prod_id=s.prod_id) loop -- aggregate SUM and COUNT l_count_sales:=l_count_sales+1; l_amount_sold:=l_amount_sold+s.amount_sold; l_sum_list_price:=l_sum_list_price+p.prod_list_price; end loop; end loop; dbms_output.put_line('count_sales='||l_count_sales||' amount_sold='||l_amount_sold||' sum_list_price='||l_sum_list_price); end; / PL/SQL procedure successfully completed. Elapsed: 00:02:00.374
I have run this within the database with PL/SQL because I don’t want to add network rountrips and process switches to this bad design. You see that it takes 2 minutes here. Why? Because the risk when providing an API that doesn’t support joins is that the developer will do the join in his procedural code. Without SQL, the developer has no efficient and agile way to do this GROUP BY and SUM that was a one-liner in SQL: he will either loop on this simple scan/get API, or she will add a lot of code to initialize and maintain an aggregate derived from this table.
So, what can I do to avoid a user running this kind of query that will take a lot of CPU and IO resources? A simpler API will not solve this problem as the user will workaround this with many small queries. In the Oracle Autonomous Database, the admin can set some limits per service:
This says: when connected to the ‘TP’ service (which is the one for transactional processing with high concurrency) a user query cannot use more than 5 seconds of elapsed time or the query is canceled.
Now if I run the statement again:
Error starting at line : 54 File @ /home/opc/demo/tmp/atp-resource-mgmt-rules.sql In command - declare l_count_sales number:=0; l_amount_sold number:=0; l_sum_list_price number:=0; begin -- scan SALES for s in (select * from sh.sales) loop -- query PRODUCTS for p in (select * from sh.products where prod_id=s.prod_id) loop -- aggregate SUM and COUNT l_count_sales:=l_count_sales+1; l_amount_sold:=l_amount_sold+s.amount_sold; l_sum_list_price:=l_sum_list_price+p.prod_list_price; end loop; end loop; dbms_output.put_line('count_sales='||l_count_sales||' amount_sold='||l_amount_sold||' sum_list_price='||l_sum_list_price); end; Error report - ORA-56735: elapsed time limit exceeded - call aborted ORA-06512: at line 9 ORA-06512: at line 9 56735. 00000 - "elapsed time limit exceeded - call aborted" *Cause: The Resource Manager SWITCH_ELAPSED_TIME limit was exceeded. *Action: Reduce the complexity of the update or query, or contact your database administrator for more information. Elapsed: 00:00:05.930
I get a message that I exceeded the limit. I hope that, from the message “Action: Reduce the complexity”, the user will understand something like “Please use SQL to process data sets” and will write a query with the join.
Of course, if the developer is thick-headed he will run his loop from his application code and will run one million short queries that will not exceed the time limit per execution. And it will be worse because of the roundtrips between the application and the database. The “Set Resource Management Rules” has another tab than “Run-away criteria”, which is “CPU/IO shares”, so that one service can be throttled when the overall resources are saturated. With this, we can give higher priority to critical services. But I prefer to address the root cause and show to the developer that when you need to join data, the most efficient is a SQL JOIN. And when you need to aggregate data, the most efficient is SQL GROUP BY. Of course, we can also re-design the tables to pre-join (materialized views in SQL or single-table design in DynamoDB for example) when data is ingested, but that’s another topic.
In the autonomous database, the GUI makes it simple, but you can query V$ views to monitor it. For example:
06:38:20 SQL> select sid,current_consumer_group_id,state,active,yields,sql_canceled,last_action,last_action_reason,last_action_time,current_active_time,active_time,current_consumed_cpu_time,consumed_cpu_time from v$rsrc_session_info where sid=sys_context('userenv','sid'); SID CURRENT_CONSUMER_GROUP_ID STATE ACTIVE YIELDS SQL_CANCELED LAST_ACTION LAST_ACTION_REASON LAST_ACTION_TIME CURRENT_ACTIVE_TIME ACTIVE_TIME CURRENT_CONSUMED_CPU_TIME CONSUMED_CPU_TIME ________ ____________________________ __________ _________ _________ _______________ ______________ ______________________ ______________________ ______________________ ______________ ____________________________ ____________________ 41150 30407 RUNNING TRUE 21 1 CANCEL_SQL SWITCH_ELAPSED_TIME 2020-07-21 06:38:21 168 5731 5731 5731 06:39:02 SQL> select id,name,cpu_wait_time,cpu_waits,consumed_cpu_time,yields,sql_canceled from v$rsrc_consumer_group; ID NAME CPU_WAIT_TIME CPU_WAITS CONSUMED_CPU_TIME YIELDS SQL_CANCELED ________ _______________ ________________ ____________ ____________________ _________ _______________ 30409 MEDIUM 0 0 0 0 0 30406 TPURGENT 0 0 0 0 0 30407 TP 286 21 5764 21 1 30408 HIGH 0 0 0 0 0 30410 LOW 0 0 0 0 0 19515 OTHER_GROUPS 324 33 18320 33 0
You can see one SQL canceled here in the TP consumer group and my session was at 5.7 consumed CPU time.
I could have set the same programmatically with:
exec cs_resource_manager.update_plan_directive(consumer_group => 'TP', elapsed_time_limit => 5);
So, rather than limiting the API, better to give full SQL possibilities and limit the resources used per service: it makes sense to accept only short queries from the Transaction Processing services (TP/TPURGENT) and allow more time, but less shares, for the reporting ones (LOW/MEDIUM/HIGH)