Sei sulla pagina 1di 11

Red Stack Tech Ltd

James Anthony
Technology Director

Oracle 12c InMemory


A brief introduction

1
Introduction

Im pretty sure a LOT is going to be written about the InMemory option for 12c
released in July this year. We at Red Stack Tech have been lucky enough to be
part of the beta programme and therefore have been using it for a few months
now, and Ive got to say its pretty awesome! In my mind this is the biggest thing
to happen to the database since RAC arrived. I was asked to put this article
together to give an intro to the InMemory option, the concepts and some of the
performance gains. If youre interested to know more, or even want to try it out,
drop me a line at james.anthony@redstk.com.

Before I start a quick word of warning there is a lot left unsaid in this article,
and its definitely not a deep dive. I was asked to keep this article short, and
failed, but even so a lot of pruning has had to go off!

Its all about columns!

I remember a few years back a lot of fuss was being made about column store
database in the warehousing space, but much of that came to nothing when
people started getting impacted by the column cliff. Oracle themselves
introduced HCC to provide some of the benefits of columnar storage (namely
the fact that compression works better in column storage than traditional row
storage). In the last couple of years in memory has become an increasing
trend, driven by ever falling RAM prices and the ability of modern CPUs to
address increased amounts of main memory.

The 12c InMemory option merges these two concepts, because at heart its an
in memory column store. Simply put the RDBMS engine will maintain a separate
pivoted view of your data in memory, and hold this in column format.. and dont
worry, through a journaling process the row cache (your current buffer cache in
the SGA) and the column store are kept in sync.

The (incredibly simplified) figure 1 shows this, with an amount of data being held
in a block shown with the dotted boxes. You can see on the left a traditional row
storage format, that would (and still is) held in the buffer cache, each row is then
pivoted and the data held within the column store.

2
Figure
Figure 1 1

When a predicate is applied (for example order_value > x) the column store can
then be queried, with the optimizer only required to scan the values for a single
column in comparison to the row store where to filter on that predicate the other
columns must also be superfluously read. Enhancements such as SIMD
processing, compression and min/max pruning (covered later) provided
significant speed up to this processing.

At this stage youre possibly thinking well if I have xGB of data that means I need
xGB for the column store, but thats not the end of the story. For starters (and
crucially) the InMemory option does NOT require all of a table be within the
column store! The optimizer can seamlessly work with a query where part of the
data is within the column store and part of it resides on disk still (indeed on initial
querying the data may not yet be in the column store and this is exactly what will
happen whilst the store is background populated). Its worth noting as well that
you can choose just to put given partitions into the column store.

Multiple Predicates

So what happens when we have multiple predicates? Remember those bloom


filters that got talked about when they first appeared in the Oracle optimizer?
They perform an incredibly efficient job here. Multiple columns can be scanned
and predicate filtration applied, with the resultant bloom filters merged to provide
the desired result set. I wrote a paper on bloom filters some time ago that you
can find on the Red Stack Tech site, so wont cover them here for the sake of
brevity.

Predicate Filtering based on Min/Max values

Anyone who has worked on Exadata will know just how powerful the storage
indexes maintained at the cell level are. InMemory brings a similar capability.
For InMemory the min and max values are stored for each InMemory
Compression Unit (IMCU), these IMCUs are the storage format (similar to an
Oracle block but much larger) within the column store.

3
Dropping indexes/Removing reporting
databases/Operational efficiencies

Whilst a lot of the headlines around InMemory will be clearly around what are
going to be some extraordinary performance gains for reporting/analytical
workloads its worth noting the impact on OLTP and general efficiencies.

Within a typical database a large portion of the space used will be for indexing
(go on, just do a quick query on dba_extents and group it by object type to figure
out your value). These indexes both increase the size of the database, but also
slow down OLTP operations as they need maintaining (especially where we are
inserting new rows). 12c InMemory gives us the opportunity to totally remove the
indexes need to service reporting, querying workloads, allowing our databases
to be smaller (backing up, recovering and cloning faster, and aggregating gains
across non-production environments), but also accelerating OLTP by reducing
the index maintenance operations.

By the same means we see a lot of organisations who run separate reporting
databases, or ODS systems to offload reporting from the production. I firmly
believe that InMemory is going to chance the game here, allowing organisations
to report from the real time data (eliminating lag), shifting the compute power and
Oracle licencing from these reporting databases etc. to the production system,
and reducing the amount of operational work the DBAs and administrators must
do to manage these ancillary datastores.

Pipelining and SIMD Vector Processing

Pipelining is a process designed to improve the throughput of an operation (as


opposed to the speed of an individual operation).

Pipelining breaks a single operation into multiple micro-operations, and each


micro-operation is joined to the next in the manner of a pipe carrying water.
Modern CPUs the clock cycle are exactly that, with an internal clock signal
causing the CPU memory to store a new value, in between clock cycles the
logic occurs.

By breaking the operation into smaller micro-operations, the overall


performance is bound by the time taken to complete the longest running micro-
operation, and no wasted time occurs between memory operations.

4
The following diagram illustrates this more clearly, showing the stages an instruction
goes through in the CPU (fetch, decode, execute, store result) and how pipelining
ensures that no idle time is encountered. In the first example (non-pipelined) you
can see how each phase completes before the next begins, with each phase
consuming a clock cycle. In the pipelined example below you can see how the
different parts of the CPU are used in parallel to process more operations in a given
space of time.

Traditional

Pipelined

Figure 2

SIMD (Single Instruction Multiple Datapoint) Processing

SIMD processing is particularly good for the type of columnar scans being
performed in the 12c InMemory Database, allowing for the repetitive task of
evaluating a predicate against several rows worth of data in a single pass
operation as opposed to having each tuple evaluated separately in a scalar
operation (one instruction to process one data value). One of the drawbacks of
SIMD is that differing operations cannot be applied to the data values, but in this
case SIMD works in this case as the same operation is being applied to each
value. By using the Intel (and other) optimisations for SIMD vector processing
the Oracle 12c InMemory code is able to scan a greater number of data values
with each CPU operation, significantly improving throughput.

5
Figure 3

Simplicity

Putting stuff into the InMemory column store couldnt be easier, we just alter the
table using the inmemory clause as follows:

alter table orders inmemory;

We can also specify a subset of columns, for example in the following we put all
columns of a table into memory except one:

CREATE TABLE inmem_test (id NUMBER, vardata VARCHAR2(200), irrelevant_col


VARCHAR2(200)) INMEMORY NOINMEMORY (irrelevant_col)

I was asked to keep this article short, so I wont expand too much, but once this
has been issued the first query against the table will begin the loading into
memory. Its also possible to use the inmemory_priority attribute of a table to
specify that it should be loaded into memory preferentially at database startup
based on their priority level (LOW, MEDIUM, HIGH and CRITICAL)

Compression Levels

Im not going to labour too much on compression within Oracle, as it has been
done to death in many articles. Suffice to say one of the key advantages of
column storage is that compression levels through de-duplication are higher than
that of row storage. The InMemory option allows for differing levels of
compression to be applied (using the MEMCOMPRESS keyword).

DEFAULT: 2-5x compression, optimised for throughput


BALANCED: 3-10x compression, adds OZIP on top of throughput
compression
SPACE: 5-20x compression, some performance impact (CPU overhead
on data in/out)

6
To give you an idea on compression rates we applied these to some data tables we
use for demonstration purposes and got a 9.5x compression rate on a 23m row orders
table, with some of the dimension tables getting 30x compression. As always your
mileage will vary depending on the data (and often other ordering of the data), but
running the following query will yield your compression ratios:

select o.object_name object_name, i.bytes original_size, i.inmemory_size,


i.bytes/ i.inmemory_size compress_ratio
from v$im_segments i, user_objects o
where i.segment_name = o.object_name;

OBJECT_NAME ORIGINAL_SIZE INMEMORY_SIZE COMPRESS_RATIO


----------- ------------- ------------- --------------

H_LINEITEM 3075735552 1496842240 2.05481611

NOTE : The results above used the default compression of FOR QUERY LOW
enabling maximum throughput.

The Results!

So youre probably keen to see just how fast this makes it right! Well, going back to
that 23m row table of orders and contrasting query performance against data held
entirely in the SGA buffer cache (so no physical IO its all logical IO)

First an example of scanning an entire column. Remember in this case we will be able
to just read the compressed lo_ordtotalprice column from the column store, but we
wont be able to use any of the column index optimisations.

SQL> select /* BUFFER_CACHE */ max(l_extendedprice) from h_lineitem;

MAX(L_EXTENDEDPRICE)
--------------------
104948.5

Elapsed: 00:00:01.89

And a quick look at some stats..


Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
375041 consistent gets
0 physical reads
204 redo size
557 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

7
Now running the same query against the InMemory column store..
SQL> select max(l_extendedprice) from h_lineitem
2 /

MAX(L_EXTENDEDPRICE)
--------------------
104948.5

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 192022634

---------------------------------------------------------------------------
-----

| Id | Operation | Name | Rows | Bytes | Cost (%CP


U)| Time | TQ |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------
-----

| 0 | SELECT STATEMENT | | 1 | 6 | 2152 (


4)| 00:00:01 | | | |

| 1 | SORT AGGREGATE | | 1 | 6 |
| | | | |

| 2 | PX COORDINATOR | | | |
| | | | |

| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 6 |


| | Q1,00 | P->S | QC (RAND) |

| 4 | SORT AGGREGATE | | 1 | 6 |
| | Q1,00 | PCWP | |

| 5 | PX BLOCK ITERATOR | | 23M| 137M|


2152 (
4)| 00:00:01 | Q1,00 | PCWC | |

| 6 | TABLE ACCESS INMEMORY FULL| H_LINEITEM | 23M| 137M|


2152 (
4)| 00:00:01 | Q1,00 | PCWP | |

---------------------------------------------------------------------------
-----

Statistics
----------------------------------------------------------
9 recursive calls
0 db block gets
75 consistent gets
0 physical reads
0 redo size

Thats a reduction in query time from 1.89 seconds to a meagre 0.02 seconds (to
scan 23m records)... and thats memory vs. memory! Pretty impressive!

The observant amongst you are probably right now suggesting its not a fair test as
wed have an index on that field. Well, yep I agree, but a) this is a very simple test
with no predicate and b) a quick test showed the results vs an index on that column
are almost identical, but the index consumes another 16% of the space already taken
up by the table, dropping it (and any other indexes) has a big impact on database
size (backup, recovery, cloning etc.) and OLTP insert/update performance.

8
Lets go with another example and this time add a predicate in there to allow us to
accelerate with the ability to filter against the min/max values stored at IMCU level,
again please note this is memory vs memory and all Logical IO, no physical, running
against just over 375m rows in this case.

SQL> SELECT /* BUFFER CACHE */ l_shipdate, l_suppkey, l_quantity


FROM h_lineitem
WHERE l_shipdate = '01-DEC-98'

L_SHIPDAT L_SUPPKEY L_QUANTITY


--------- ---------- ----------
01-DEC-98 34410 34
01-DEC-98 32273 42
01-DEC-98 4615 8

Elapsed: 00:00:55.94
SQL> ALTER SESSION set inmemory_query = enable;

Session altered.

Elapsed: 00:00:00.00

SELECT l_shipdate, l_suppkey, l_quantity


FROM h_lineitem
WHERE l_shipdate = '01-DEC-98'

L_SHIPDAT L_SUPPKEY L_QUANTITY


--------- ---------- ----------
01-DEC-98 34410 34
01-DEC-98 32273 42
01-DEC-98 4615 8

Elapsed: 00:00:00.63

So weve dropped from 55.94 seconds to 0.63 seconds! Thats a reduction of


98.87% by time or an improvement of 8,879% (dont you just love statistics). Now
imagine we were serving this data from disk (physical IO), and you can see the
performance gains we are likely to get in the real world

So how much impact did those minmax filtrations have?

SELECT display_name, value


FROM v$mystat m, v$statname n WHERE m.statistic# = n.statistic#
AND display_name
IN ('IM scan CUs optimized read' , 'IM scan CUs pruned',
'IM scan CUs predicates optimized', 'IM scan segments minmax eligible'
)

DISPLAY_NAME VALUE
---------------------------------------------------------------- ---------
-
IM scan CUs predicates optimized 7
IM scan CUs optimized read 0
IM scan CUs pruned 7
IM scan segments minmax eligible 372

IM scan segments minmax eligible: shows the number of IMCUs that


were scanned
IM scan CUs optimized read: all rows passed the predicate
IM scan CUs predicates optimized: A count of segments where either all
rows, or no rows passed the filtration
IM scan CUs pruned: The number of segments that the minmax values
did not pass the predicate filtration

9
Lets run another example this time using the order value (against 96m rows in this
case):

SQL> select * from h_order where o_totalprice > 550000;

O_ORDERKEY O_CUSTKEY O O_TOTALPRICE O_ORDERDA O_ORDERPRIORITY O_CLERK


---------- ---------- - ------------ --------- --------------- --------------
-
O_SHIPPRIORITY
--------------
O_COMMENT
-----------------------------------------------------------------------------
--
180244418 635959 F 552697.34 09-NOV-94 3-MEDIUM
Clerk#000044823
0
cording to the furiously ironic requests maintain slyly along th

240699590 3378625 O 554068.82 15-AUG-96 1-URGENT


Clerk#000002141
0
timents. quickly final courts doze regularly

DISPLAY_NAME VALUE
---------------------------------------------------------------- ----------
IM scan CUs predicates optimized 72
IM scan CUs optimized read 0
IM scan CUs pruned 72
IM scan segments minmax eligible 91

From the stats above we can see that for my very simple query on a single run we only
evaluated 19 of the 91 storage chunks in memory (known as an InMemory Column
Unit or IM CUs), and eliminated 72 of them! So even though we know from the previous
example we can scan columns quickly not reading just over 83% of the data is always
going to help!

Conclusion

InMemory is certainly a powerful addition to the Oracle RDBMS. Is it going to be a


magic bullet to solve all query performance issues? Obviously not, but Im extremely
optimistic about its benefits on both analytical and OLTP workloads. Whats more
unlike other in memory solutions its transparent to the application, so we dont have
to set about re-coding. Sure weve got to get to 12c to derive the benefit, and we have
to pay the licence costs, but with 11gR2 being 5+ years old at time of writing this 12c
adoption has to increase (you wouldnt buy a 5 year old piece of hardware so why
implement 5 year old software).

This article barely scratches the surface of our testing on InMemory, and I was asked
to keep it short and Im failing to do that! So if youve got any questions drop me a line
at james.anthony@redstk.com.

10
Contact Red Stack Tech for more information

UK Headquarters:

3rd Floor
Farr House
27 30 Railway Street
Chelmsford
Essex
England
CM1 1QS

Main: 0844 811 3600 Direct: 01245 200 510

Australia Headquarters:

Suite 3
Level 19
141 Queen Street
Brisbane
QLD 4000

Main: +61 (0) 7 3210 0132

Email: contactus@redstk.com
Web: www.redstk.com

Follow Red Stack Tech on Twitter: @redstacktech

Media Enquiries:

Elizabeth Spencer
elizabeth.spencer@redstk.com
01245 200 532

Red Stack Tech Ltd


3rd Floor
Farr House
27 30 Railway Street
Chelmsford
Essex
England
CM1 1QS

11

Potrebbero piacerti anche