Sei sulla pagina 1di 76

Best Practices in SQL, PL/SQL Query Writing

Agenda

1. Basic of SQL 2. Best Practices for writing SQL query 3. Best Practices for PL/SQL

SQL processing uses the following main components to execute a SQL query: The Parser checks both syntax and semantic analysis. The Optimizer uses costing methods, cost-based optimizer (CBO), or internal rules, rule-based optimizer (RBO), to determine the most efficient way of producing the result of the query. The Row Source Generator receives the optimal plan from the optimizer and outputs the execution plan for the SQL statement. The SQL Execution Engine operates on the execution plan associated with a SQL statement and then produces the results of the query.

The SELECT Statement


Here is a simplistic form of the Oracle SQL syntax for the SELECT statement
SELECT { [alias.]column | expression | [alias.]* [ , ] } FROM [schema.]table [alias] [ WHERE [ [schema.]table.|alias.] { column | expression } comparison { } [ { AND | OR } [ NOT ] ] ] [ GROUP BY { expression | rollup-cube | grouping-sets } [, { expression | rollup-cube | grouping-sets } ... ] [ HAVING condition ] ] [ ORDER BY { { column | expression | position } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ] } ]

Joining Tables
The syntax of the Oracle SQL proprietary join format is as shown: SELECT { [ [schema.]table. | [alias.] ] { column | expression [, ... ] } | * } FROM [schema.]table [alias] [, ] [ WHERE [ [schema.]table.|alias.] { column | expression [(+)] } comparison condition [ [schema.]table.|alias.] { column | expression [(+)] } [ { AND | OR } [ NOT ] ] ] [ GROUP BY ] [ ORDER BY ]

ANSI JOIN
SELECT {{[[schema.]table.|alias.] {column|expression} [, ]}|*} FROM [schema.]table [alias] [ CROSS JOIN [schema.]table [alias] | NATURAL [INNER | [ LEFT | RIGHT | FULL] OUTER] JOIN [schema.]table [alias] |{ [INNER | [LEFT | RIGHT | FULL] OUTER] JOIN [schema.]table [alias] { ON (column = column [{AND | OR} [NOT] column = column ]) | USING (column [, column ]) } } ] [ WHERE ] [ GROUP BY ] [ ORDER BY ];

Examples Of Joins
For oracle proprietary format

SELECT di.name, de.name FROM division di, department de WHERE di.division_id = de.division_id(+);

ANSI SELECT di.name, de.name FROM division di LEFT OUTER JOIN department de USING (division_id);

Types of Joins
Cross-Join. A cross-join (or Cartesian product) is a merge of all rows in both tables, where each row in one table is matched with every other row in the second table: SELECT * FROM division, managerof; Inner or Natural Join. An inner join is an intersection between two tables, joining based on a column or column names: SELECT * FROM division NATURAL JOIN managerof; Outer Join. An outer join joins rows from two tables. Rows joined are those both in the intersection plus rows in either or both tables, and not in the other table.

Left Outer Join. A left outer join joins all intersecting rows plus rows only in the left table: SELECT * FROM division NATURAL LEFT OUTER JOIN managerof; Right Outer Join. A right outer join is the opposite of a left outer join: the intersection plus all rows in the right table only: SELECT * FROM division NATURAL RIGHT OUTER JOIN managerof; Full Outer Join. A full outer join retrieves all rows from both tables: SELECT * FROM division NATURAL FULL OUTER JOIN managerof; Self-Join. A self-join joins a table to itself: SELECT manager.name, employee.name FROM employee manager JOIN employee employee ON (employee.manager_id = manager.employee_id);

Equi-/Anti-/Range Joins. These joins use the appropriate comparison conditions to join rows in tables: SELECT * FROM division di JOIN department de ON(di.division_id = de.division_id);

Subqueries
There are specific types of subqueries: Single-Row Subquery. A single-row subquery returns a single row from the subquery. Some comparison conditions require a single row with a single column: SELECT * FROM project WHERE projecttype_id = (SELECT projecttype_id FROM projecttype WHERE projecttype_id = 1); Multiple-Row Subquery. A multiple-row subquery returns one or more rows from the subquery back to the calling query: SELECT project_id FROM project WHERE projecttype_id IN (SELECT projecttype_id FROM projecttype);

Multiple-Column Subquery. A multiple-column subquery returns many columns: SELECT COUNT(*) FROM(SELECT * FROM project); Regular Subquery. A regular subquery executes a subquery in its entirety where there is no communication between calling query and subquery: SELECT * FROM department WHERE division_id IN (SELECT division_id FROM division);

Correlated Subquery. A correlated subquery can use a value passed from a calling query as a parameter, to filter specific rows in the subquery. Values can only be passed from calling to subquery, not the other way around SELECT * FROM division WHERE EXISTS (SELECT division_id FROM department WHERE division_id = division.division_id);

The Basics of efficient SQL Name the Columns in a Query: There are three good reasons why it is better to name the columns in a query rather than to use "select * from ...". 1. Network traffic is reduced. This can have a significant impact on performance if the table has a large number of columns, or the table has a long or long raw column (both of which can be up to 2 GB in length). These types of columns will take a long time to transfer over the network and so they should not be fetched from the database unless they are specifically required. 2. The code is easier to understand.

3. It could save the need for changes in the future. If any columns is added to or removed from the base table/view, then select * statement can produce wrong results set and statement may fail. For example: SELECT division_id, name, city, state, country FROM division; Is faster than: SELECT * FROM division; Also, since there is a primary key index on the Division table: SELECT division_id FROM division;

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM stock; Query Cost Rows Bytes -------------------------------------- --------SELECT STATEMENT on 1 118 9322 TABLE ACCESS FULL on STOCK 1 118 9322 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id FROM stock; Query Cost Rows Bytes ---------------------------- ------ ------- --------SELECT STATEMENT on 1 118 472 INDEX FULL SCAN on XPKSTOCK 1 118 472

Filtering with the WHERE Clause


Try to always do two things with WHERE clauses: Always try to use unique, single-column indexes wherever possible. A single-column unique index is much more likely to produce exact hits, and an exact hit is the fastest access method.

Use table alias:


Always use table alias and prefix all column names with the aliases when you are using more than one table. If an alias is not present, the engine must resolve which tables own the specified columns. The following is an example:
SELECT first_name, last_name, country FROM employee, countries WHERE country_id = id AND lastname = 'HALL'; SELECT e.first_name, e.last_name, c.country FROM employee e, countries c WHERE e.country_id = c.id AND e.last_name = 'HALL';

Use EXISTS instead of DISTINCT


. The DISTINCT keyword works by selecting all the columns in the table then parses out any duplicates. Instead, if you use sub query with the EXISTS keyword, you can avoid having to return an entire table. For example: Select distinct ca.ca_id from t_sku t ,t_ca ca where t.ca_id=ca.ca_id Can be written as Select t.ca_id from t_ca t where exists (select 1 from t_sku t where ca.ca_id=t.ca_id)

Use OR instead of UNION on the same table


When selecting data from a single table that requires a logical or, it is easier to view the process of the query by using an UNION. This method is inefficient because it requires an unnecessary intermediate table. By joining the inner query with the outer query through an OR, it will eliminate the extra sub query and intermediate table. Example:
Before: SELECT hemenbr, hename FROM helpfiles WHERE hemenbr = 5 UNION SELECT hemenbr, henam FROM helpfiles WHERE hename = 'help_address.html' After: SELECT DISTINCT hemenbr, hename FROM helpfiles WHERE hemenbr = 5 OR hename = 'help_address.html'

Use of NOT operator on indexed columns:


Never use NOT operator on an indexed column. Whenever Oracle encounters a NOT on an index column, it will perform full-table scan. For Example: SELECT * FROM emp WHERE NOT deptno = 0; Instead use the following: SELECT * FROM emp WHERE deptno > 0

Function or Calculation on indexed columns:


Never use a function or calculation on an indexed column. If there is any function is used on an index column, optimizer will not use index. For Example: Do not use until need exactly match string: SELECT * FROM emp WHERE SUBSTR (ename, 1, 3) = 'MIL'; Use following instead: SELECT * FROM emp WHERE ename LIKE 'MIL%';

Combine Multiples Scans with CASE Statements:


Often, it is necessary to calculate different aggregates on various sets of tables. Usually, this is done with multiple scans on the table, but it is easy to calculate all the aggregates with one single scan. Eliminating n-1 scans can greatly improve performance. Combining multiple scans into one scan can be done by moving the WHERE condition of each scan into a CASE statement, which filters the data for the aggregation. For each aggregation, there could be another column that retrieves the data. The following example has count of all employees who earn less then 2000, between 2000 and 4000, and more than 4000 each month. This can be done with three separate queries.

SELECT COUNT (*) FROM emp WHERE sal < 2000; SELECT COUNT (*) FROM emp WHERE sal BETWEEN 2000 AND 4000; SELECT COUNT (*) FROM emp WHERE sal>4000; However, it is more efficient to run the entire query in a single statement. Each number is calculated as one column. The count uses a filter with the CASE statement to count only the rows where the condition is valid. For example: SELECT COUNT (CASE WHEN sal < 2000 THEN 1 ELSE null END) count1, COUNT (CASE WHEN sal BETWEEN 2001 AND 4000 THEN 1 ELSE null END) count2, COUNT (CASE WHEN sal > 4000 THEN 1 ELSE null END) count3 FROM emp;

IN v/s EXISTS
IN should be used to test against literal values and EXISTS to create a correlation between a calling query and a subquery. IN will cause a subquery to be executed in its entirety before passing the result back to the calling query. EXISTS will stop once a result is found. IN is best used as a preconstructed set of literal values There are two advantages to using EXISTS over using IN. The first advantage is the ability to pass values from a calling query to a subquery, never the other way around, creating a correlated query. The correlation allows EXISTS the use of indexes between calling query and subquery, particularly in the subquery.

The second advantage of EXISTS is that, unlike IN, which completes a subquery regardless, EXISTS will halt searching when a value is found. The benefit of using EXISTS rather than IN for a subquery comparison is that EXISTS can potentially find much fewer rows than IN. IN is best used with literal values, and EXISTS is best used as applying a fast access correlation between a calling and a subquery.

Use BETWEEN instead of IN


The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range into an IN. Example: Before: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr IN (508858, 508859, 508860, 508861,508862, 508863, 508864) After: SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr BETWEEN 508858 and 508864

Joins
A join is a combination of rows extracted from two or more tables. Joins can be very specific, for instance, an intersection between two tables, or they can be less specific, such as an outer join. An outer join is a join returning an intersection plus rows from either or both tables, not in the other table. Efficient Joins An efficient join is a join SQL query that can be tuned to an acceptable level of performance. Certain types of join queries are inherently easily tuned and can give good performance. In general, a join is efficient when it can use indexes on large tables or is reading only very small tables

JOINS
Intersections An inner or natural join is an intersection between two tables. In mathematical set parlance, an intersection contains all elements occurring in both of the sets (elements common to both sets). An intersection is efficient when index columns are matched together in join clauses. Intersection matching not using indexed columns will be inefficient.

How to Tune a Join


There are several factors to consider:
Use equality first. Use range operators where equality does not apply. Avoid use of negatives in the form of != or NOT. Avoid LIKE pattern matching. Try to retrieve specific rows, and in small numbers. Filter from large tables first to reduce the number of rows joined. Retrieve tables in order from the most highly filtered table downward, preferably the largest table, which has the most filtering

Use indexes wherever possible, except for very small tables.


Let the optimizer do its job.

Nested Subqueries
Subqueries can be nested where a subquery can call another subquery. The following example using the Employees schema shows a query calling a subquery, which in turn calls another subquery: EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM division WHERE division_id IN (SELECT division_id FROM department WHERE department_id IN (SELECT department_id FROM project));

Use EXISTS instead of LEFT JOIN.


The LEFT JOIN merges the outer query with the inner query and keeps the extra rows from the outer table. The same result can be obtained by using an EXISTS sub query. This will eliminate the need to compare two tables as the inner query acts as a filter when the outer query executes. Example: Before: SELECT merfnbr, mestname FROM merchant LEFT JOIN helpfiles ON merfnbr=hemenbr After: SELECT merfnbr, mestname FROM merchant WHERE EXISTS (SELECT * FROM helpfiles where merfnbr = hemenbr)

Replacing Joins with Subqueries


Huge joins can be made easier in all respects by using subqueries in two ways, replacing complex mutable joins as follows: 1. A table in the join that is not returning a column in the primary calling query can be removed from the join, and checked using a subquery. The table is not really part of the join, so why retain it in the data being returned for display? 2. FROM clauses can contain nested subqueries to break up joins much in the way that PL/SQL would use nested looping cursors. 3. An ORDER BY clause is always applied to a final result and should not be included in subqueries if possible.

4.When testing against subqueries, retrieve, filter, and aggregate on indexes, not tables. 5. Do not be too concerned about full table scans on very small static tables.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT c.name FROM customer c JOIN orders o USING(customer_id) JOIN ordersline ol USING(order_id) JOIN transactions t USING(customer_id) JOIN transactionsline tl USING(transaction_id) WHERE c.balance > 0;

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT c.name FROM customer c WHERE c.balance > 0 AND EXISTS( SELECT o.order_id FROM orders o WHERE o.customer_id = c.customer_id AND EXISTS( SELECT order_id FROM ordersline WHERE order_id = o.order_id )) AND EXISTS( SELECT t.transaction_id FROM transactions t WHERE t.customer_id = c.customer_id AND EXISTS( SELECT transaction_id FROM transactionsline WHERE transaction_id = t.transaction_id ) );

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT c.name, tl.amount FROM customer c JOIN orders o USING(customer_id) JOIN ordersline ol USING(order_id) JOIN transactions t USING(customer_id) JOIN transactionsline tl USING(transaction_id) WHERE tl.amount > 3170 AND c.balance > 0;

SELECT c.name, b.amount FROM customer c, (SELECT t.customer_id, a.amount FROM transactions t,( SELECT transaction_id, amount FROM transactionsline WHERE amount > 3170) a WHERE t.transaction_id = a.transaction_id )b WHERE c.balance > 0 AND EXISTS( SELECT o.order_id FROM orders o WHERE o.customer_id = c.customer_id AND EXISTS( SELECT order_id FROM ordersline WHERE order_id = o.order_id ) );

Best Practices in PL/SQL


Many ways of writing a code Code
Will give solution (some how) Time taken and resource consumed ?

Best code
Use optimal resource Provides quicker solution

Modularized Design
Bad design
Dump your logic in a single procedure Having lots of selects inserts updates and deletes.etc

Good design
Break your logic into small blocks Grouping related logic as a single block or program

Modularize
Modularize will
reduce complexity make your tasks manageable make your resulting code maintainable

Use Packages
For each major functionality With repeated DML as procedure With repeated select as functions

Naming convention
Follow a standard throughout your code
Easy to understand Easy for maintain and change

Example
Local variable l_var_name Procedure parameter p_var_name Global variable g_var_name

Avoid hard coding


CREATE OR REPLACE PROCEDURE GEN_SWIP ( an_document_number IN number, an_serv_item_id IN number, an_srsi_ip_addr_seq IN varchar(20)) As begin select d_no into an_document_number from task; end; What happens if the d_no column changed to varchar2 type ?

CREATE OR REPLACE PROCEDURE GEN_SWIP ( an_document_number IN asap.serv_req_si.document_number%TYPE, an_serv_item_id IN asap.serv_req_si.serv_item_id%TYPE, an_srsi_ip_addr_seq IN asap.srsi_ip_addr.srsi_ip_addr_seq%TYPE,) as begin null; end;

Repeated SQL as functions


This is quintessential for performance
Avoid repeating the SQL in different places Hard parsing will be avoided Identify at the time of designing

Writing Efficient PL/SQL


Bind Variables Using ROWIDs when Updating Use Bulk Collect Implicit vs. Explicit Cursors Declarations, Blocks, Functions and Procedures in Loops Duplication of Built-in String Functions Minimize Datatype Conversions Efficient Function Calls Using the NOCOPY Hint Using PLS_INTEGER and BINARY_INTEGER Types Using BINARY_FLOAT and BINARY_DOUBLE Types Native Compilation of PL/SQL Conditional Compilation Avoid unnecessary PL/SQL

Bind Variables
Oracle performs a CPU intensive hard parse for all new statements. Statements already present in the shared pool only require a soft parse. Statement matching uses Exact Text Match, so literals, case and whitespaces are a problem. bind_variable_usage.sql Unnecessary parses waste CPU and memory. bind_performance.sql

Bind Variables Cursor Sharing


Cursor sharing reduces the impact of literals. Bind variables are substituted for literals. cursor_sharing.sql Useful for third-party applications. Cursor sharing is unnecessary if you do the job properly. May cause performance issues in some code.

Overview of the PL/SQL Engine


PL/SQL Engine PL/SQL Block PL/SQL Block
Procedural Statement Executor

Oracle Server
Executor

SQL Statement

PL/SQL contains procedural and SQL code. Each type of code is processed separately. Switching between code types causes an overhead. The overhead is very noticeable during batch operations. Bulk binds minimize this overhead.

Using ROWIDs for Updates/ Delete


ROWIDs are a unique row identifier. The ROWID represents the physical address of the row in the database, but are subject to change, so dont store them. When you have to select rows, then subsequently update them, retrieve the ROWID also and use that to identify the row to update. rowid_test.sql

Row y row rocessing of

M in

Oracle server
/
/ block

ti e

i e
roced ral state e t exec tor

i e

FOR rec IN emp_cur LOOP UPDATE employee SET salary = ... WHERE employee_id = rec.employee_id; END LOOP;

state e t exec tor

Performance penalty for many context switches

Bulk processing with FORALL

Oracle server
/
/ block
FORALL indx IN list_of_emps.FIRST.. list_of_emps.LAST UPDATE employee SET salary = ... WHERE employee_id = Update... list_of_emps(indx); Update... Update... Update... Update... Update... Update... Update... Update... Update... Update... Update...

ti e

i e
roced ral state e t exec tor

i e

state e t exec tor

Fewer context switches, same SQL behavior

Bulk Processing in PL/SQL

FORALL Use with inserts, updates, deletes and merges. Move data from collections to tables. BULK COLLECT Use with implicit and explicit queries. Move data from tables into collections. In both cases, the "back back" end processing in the SQL engine is unchanged. Same transaction and rollback segment management Same number of individual SQL statements will be executed. But BEFORE and AFTER statement-level triggers only fire once per FORALL INSERT statements

Bulk-Binds BULK COLLECT


Populate collections directly from SQL using BULK COLLECT. bulk_collect.sql Collections are held in memory, so watch collection sizes. bulk_collect_limit.sql Implicit array processing introduced in 10g. implicit_array_processing.sql
SELECT * BULK COLLECT INTO l_tab FROM tab1; OPEN c1; LOOP FETCH c1 BULK COLLECT INTO l_tab LIMIT 1000; EXIT WHEN l_tab.count = 0; -- Process chunk. END LOOP; CLOSE c1;
FOR cur_rec IN (SELECT * FROM tab1) LOOP -- Process row. END LOOP

Bulk-Binds FORALL
Bind data in collections into DML using FORALL. FORALL i IN l_tab.FIRST .. l_tab.LAST INSERT INTO tab2 VALUES l_tab(i); l_tab( Insert_all.sql Use INDICIES OF and VALUES OF for sparse collections. Use SQL%BULK_ROWCOUNT to return the number of rows affected by each statement.
The SAVE EXCEPTIONS allows bulk operations to complete. Exceptions captured in SQL%BULK_EXCEPTIONS.

Declarations in Loops
Code within loops gets run multiple times. Variable declarations and procedure/function calls in loops impact on performance. Simplify code within loops to improve performance.
-- Bad idea. FOR i IN 1 .. 100 LOOP DECLARE l_str VARCHAR2(200); BEGIN -- Do Something. END; END LOOP;

-- Better idea. DECLARE l_str VARCHAR2(200); BEGIN FOR i IN 1 .. 100 LOOP -- Do Something. END LOOP; END;

Using the NOCOPY Hint


The NOCOPY hint allows OUT and IN OUT parameter to be passed byreference, rather than by-value. PROCEDURE myproc (p_tab IN OUT NOCOPY CLOB) IS BEGIN -- Do something. END; By-value: Procedure uses temporary buffer. Copies value back on successful completion. By-reference: Procedure uses original memory location directly. Beware of affect of error handling and parameter aliasing on parameter values. Its a hint, not a directive, so it can be ignored

Native Compilation of PL/SQL


By default PL/SQL is interpreted. Set PLSQL_CODE_TYPE parameter to NATIVE before creating or compiling code. ALTER SESSION SET PLSQL_CODE_TYPE=NATIVE; ALTER PROCEDURE my_proc COMPILE; Prior to 11g, native compilation converts PL/SQL to C, which is then compiled in shared libraries. Improves performance of procedural logic. native_comp_test.sql Doesnt affect the speed of database calls.

INTEGER Types
NUMBER and its subtypes use an Oracle internal format, rather than the machine arithmetic. INTEGER and other constrained type need additional runtime checks compared to NUMBER. PLS_INTEGER uses machine arithmetic to reduce overhead. BINARY_INTEGER is slow in 8i and 9i, but fast in 10g because it uses machine arithmetic. integer_test.sql 11g includes SIMPLE_INTEGER which is quick in natively compiled code. Use the appropriate datatype for the job.

BINARY_FLOAT and BINARY_DOUBLE


New in 10g. They use machine arithmetic, like PLS_INTEGER and BINARY_INTEGER. Require less storage space. Fractional values not represented precisely, so avoid when accuracy is important. Approximately twice the speed of NUMBER. float_double_test.sql Use the appropriate datatype for the job.

Avoid unnecessary PL/SQL


SQL is usually quicker than PL/SQL. Dont use UTL_FILE to read text files if you can use external tables. Dont write PL/SQL merges if you can use the MERGE statement. Use multi-table inserts, rather than coding them manually. Use DML error logging (DBMS_ERRLOG) to trap failures in DML, rather than coding PL/SQL. All use DML, which is easily parallelized.

Reference: Oracle PL/SQL Programming Steven Feuerstein

Potrebbero piacerti anche