Sei sulla pagina 1di 2

ORDER BY VS SORT BY

===================
ORDER BY
-------1. The ORDER BY clause is familar from other SQL dialects.
2. It performs the 'total ordering of the query result set' i.e. ' all the data
will be passed to a single reducer' which
will take longer time if we have larger datasets.
3. Hive requires the LIMIT clause for ORDER BY as it will take longer time (if h
ives property
hive.mapred.mode is set to 'strict' which is by default 'nonstrict'
NOTE: In /usr/lib/hive/hive-default.xml--> below property
<property>
<name>hive.mapred.mode</name>
<value>nonstrict</value>
<description>The mode in which the hive operations are being performed. In str
ict mode, some risky queries are not allowed to run</description>
</property>
SORT BY
------3. Where as SORT BY (Hive's alternative to ORDER BY) , will orders the data only
within each reducer and performs
the 'local ordering of the data' where e
ach REDUCER'S OUTPUT will be SORTTED.
4. Better perfomance is traded for total ordering.

hive> select * from testudf;


OK
100
Karan
24000
101
Gopal
25000
102
RaviKanth
26000
103
Ravi Prakash
270000
104
Gopi Prakash
28000
105
Ravi Prakash
29000
106
Vishnu Prakash 31000
107
Kasi Prakash
34000
Time taken: 0.419 seconds
hive> desc testudf;
OK
empid int
ename string
esal
int
Time taken: 0.197 seconds
hive> select empid , ename , esal from testudf order by empid asc , esal desc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:

set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0001, Tracking URL = http://localhost:50030/jobd
etails.jsp?jobid=job_201304160610_0001
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:80
21 -kill job_201304160610_0001
2013-04-16 06:15:57,723 Stage-1 map = 0%, reduce = 0%
2013-04-16 06:16:08,764 Stage-1 map = 100%, reduce = 0%
2013-04-16 06:16:18,977 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0001
OK
100
Karan
24000
101
Gopal
25000
102
RaviKanth
26000
103
Ravi Prakash
270000
104
Gopi Prakash
28000
105
Ravi Prakash
29000
106
Vishnu Prakash 31000
107
Kasi Prakash
34000
Time taken: 36.859 seconds
hive> select empid , ename , esal from testudf sort by empid asc , esal desc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0002, Tracking URL = http://localhost:50030/jobd
etails.jsp?jobid=job_201304160610_0002
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:80
21 -kill job_201304160610_0002
2013-04-16 06:18:23,766 Stage-1 map = 0%, reduce = 0%
2013-04-16 06:18:31,316 Stage-1 map = 100%, reduce = 0%
2013-04-16 06:18:40,440 Stage-1 map = 100%, reduce = 33%
2013-04-16 06:18:41,451 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0002
OK
100
Karan
24000
101
Gopal
25000
102
RaviKanth
26000
103
Ravi Prakash
270000
104
Gopi Prakash
28000
105
Ravi Prakash
29000
106
Vishnu Prakash 31000
107
Kasi Prakash
34000
Time taken: 28.06 seconds
hive>

Potrebbero piacerti anche