Sei sulla pagina 1di 8

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Technology 25 (2016) 310 317

*OREDO&ROORTXLXPLQ5HFHQW$GYDQFHPHQWDQG(IIHFWXDO5HVHDUFKHVLQ(QJLQHHULQJ6FLHQFHDQG
7HFKQRORJ\ 5$(5(67 

$QHIILFLHQWSULYDF\SUHVHUYLQJVHDUFKVFKHPHZLWKDFFHVVFRQWURO
IRUFORXGGDWDFHQWHUV
7UHVD0DU\*HRUJH9 6KDPQD6-XELODQW-.L]KDNNHWKRWWDP
Department of Computer Science and Engineering, Musaliar College of Engineering and Technology, Pathanamthitta 689653, India

$EVWUDFW

7KHLQWHUQHWDQGWKHHPHUJHQFHRIVRFLDOQHWZRUNVSURGXFHWHUDE\WHVRIGDWDHYHU\GD\,QWKLVELJGDWDVFHQDULRWKHDELOLW\WR
RXWVRXUFHWKHGDWDWRDFORXGVWRUDJHIDFLOLW\VDYHVWKHGDWDPDQDJHPHQWDQGVWRUDJHIDFLOLW\FRVW6RPHPDMRUFKDOOHQJHVZLWK
WKLV VFKHPH DUH SURYLGLQJ VHFXULW\ DQG HQVXULQJ WKH SULYDF\ RI WKH RXWVRXUFHG GDWD $OWKRXJK GDWD VHFXULW\ FDQ EH DFKLHYHG
WKURXJK HQFU\SWLRQ VHDUFKLQJ RQ HQFU\SWHG GDWD EHFRPH D FRPSOH[ WDVN 7KH SURSRVHG ZRUN VXJJHVWV DQ HIILFLHQW VHDUFKLQJ
VFKHPHIRUHQFU\SWHGFORXGGDWDEDVHGRQKLHUDUFKLFDOFOXVWHULQJRIGRFXPHQWV7KHKLHUDUFKLFDOFOXVWHULQJPHWKRGSUHVHUYHVWKH
VHPDQWLF UHODWLRQVKLS EHWZHHQ WKH GRFXPHQWV LQ WKH HQFU\SWHG GRPDLQ WR VSHHG XS WKH VHDUFK SURFHVV &RQVHTXHQWO\ WKH
SURSRVHG V\VWHP KDV OLQHDU FRPSXWDWLRQDO FRPSOH[LW\ GXULQJ WKH VHDUFK SKDVH LQ UHVSRQVH WR DQ H[SRQHQWLDO LQFUHDVH LQ WKH
QXPEHURIGRFXPHQWV7KHV\VWHPDOVRHQVXUHVGDWDSULYDF\E\SURYLGLQJRQO\OLPLWHGDFFHVVRIWKHGRFXPHQWVWRWKHGLIIHUHQW
W\SHVRIXVHUVE\LPSOHPHQWLQJDFFHVVFRQWUROPHFKDQLVPVUHVXOWLQJLQPRUHVHFXUHGGDWDVWRUDJHLQWKHFORXG
7KH$XWKRUV3XEOLVKHGE\(OVHYLHU/WG
2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
3HHUUHYLHZXQGHUUHVSRQVLELOLW\RIWKHRUJDQL]LQJFRPPLWWHHRI5$(5(67
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of RAEREST 2016
Keywords:VHDUFKDEOHHQFU\SWLRQPXOWLNH\ZRUGVHDUFKKLHUDUFKLFDOFOXVWHULQJDFFHVVFRQWURO

,QWURGXFWLRQ

$IXQGDPHQWDODSSOLFDWLRQRIFORXGFRPSXWLQJLVWKHDELOLW\WRRXWVRXUFHUHPRWHGDWDWRH[WHUQDOFORXGVHUYHUVWR
HQDEOHVFDODEOHGDWDVWRUDJH7KHFORXGVHUYHUFDQSURYLGHDKXJHVWRUDJHVSDFHDQGKLJKFRPSXWDWLRQDOSRZHU>@


&RUUHVSRQGLQJDXWKRU
E-mail address:YWUHVDPJ#JPDLOFRP

2212-0173 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of RAEREST 2016
doi:10.1016/j.protcy.2016.08.112
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317 311

$FFRUGLQJO\HQWHUSULVHVDQGXVHUVZKRRZQDODUJHDPRXQWRIGDWDFDQRYHUFRPHWKHLUKDUGZDUHOLPLWDWLRQV$VWKLV
WHFKQLTXHLVEHFRPLQJPRUHDQGPRUHSRSXODUWKHGDWDYROXPHLQFORXGVWRUDJHIDFLOLWLHVLVH[SHULHQFLQJDGUDPDWLF
JURZWK
$PDMRUFRQFHUQUHJDUGLQJWKHXVHRIFORXGFRPSXWLQJIRUGDWDVWRUDJHLVWKDWWKHRXWVRXUFHGGDWDPD\FRQWDLQ
VHQVLWLYH LQIRUPDWLRQ VXFK DV SKRWRV HPDLOV EDQN VWDWHPHQWVHWF ,I WKHGDWD LV VWRUHG LQD SXEOLF FORXGZKLFK LV
DFFHVVLEOH WR VHYHUDO RWKHU SHRSOH ZLWKRXW HIILFLHQW SURWHFWLRQ PHFKDQLVP LW FDQ OHDG WR VHYHUH SULYDF\ DQG
FRQILGHQWLDOLW\ YLRODWLRQV >@ 7KH WUDGLWLRQDO ZD\ WR SUHYHQW VHQVLWLYH GDWD LV HQFU\SWLRQ 7KH GRFXPHQWV DUH
HQFU\SWHG EHIRUH RXWVRXUFLQJ WKHP WR WKH FORXG 7KLV KRZHYHU LQWURGXFHV IXUWKHU FRPSOH[LWLHV GXULQJ WKH VHDUFK
RSHUDWLRQ RQ HQFU\SWHG GDWD ZKHQ OHJLWLPDWH XVHUV QHHG DFFHVV WR WKRVH GRFXPHQWV 0DQ\ UHVHDUFKHUV KDYH
LQYHVWLJDWHGRQWKLVLVVXHLQWKHUHFHQWGD\VDQGSURSRVHGVHYHUDOFLSKHUWH[WVHDUFKVFKHPHVEDVHGRQFU\SWRJUDSK\
WHFKQLTXHV >@ >@ +RZHYHU WKHVH PHWKRGV QHHG H[WHQVLYH FRPSXWDWLRQV DQG VXIIHU IURP KLJK WLPH FRPSOH[LW\
+HQFH WKHVH PHWKRGV DUH QRW VXLWDEOH IRU D ELJ GDWD HQYLURQPHQW >@ $QRWKHU PDMRU GUDZEDFN LV WKDW WKH
UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVFRQFHDOHGGXULQJWKHHQFU\SWLRQSURFHVV0DLQWDLQLQJVXFKDUHODWLRQVKLSLV
LPSRUWDQWDVLWUHSUHVHQWVWKHSURSHUWLHVRIWKHGRFXPHQWV
,W LV DOVR QHFHVVDU\ WRSURYLGH FRQWUROOHGDFFHVV WR WKHRXWVRXUFHG FORXG GDWD WR GLIIHUHQW FODVVHVRI XVHUV 7KH
V\VWHP PXVW SUHYHQW XQDXWKRUL]HG XVHUV IURP XSORDGLQJ FRUUXSWHG GRFXPHQWV WR WKH FORXG VHUYHU )RU H[DPSOH
FRQVLGHUDXQLYHUVLW\FORXGLQZKLFKWKHVWXGHQWPDUNOLVWVDUHVWRUHGLQWKHFORXG,QVXFKDVFHQDULRWKHVWXGHQWV
PXVWEHSUHYHQWHGIURPXSORDGLQJWKHLURZQPDUNOLVWVWKHUHE\RYHUZULWLQJWKHRULJLQDOFRS\7RSUHYHQWWKLVWKH
V\VWHP ZLOO SURYLGH RQO\ GRZQORDG SULYLOHJHV WR WKH VWXGHQW XVHUV RI WKH FORXG 3URSHU LPSOHPHQWDWLRQ RI DFFHVV
FRQWUROPHFKDQLVPVZLOOHQVXUHVXFKOLPLWHGDFFHVVWRWKHGLIIHUHQWFODVVRIFORXGXVHUV
7KHSURSRVHGV\VWHPXVHVDVHDUFKLQJVFKHPHEDVHGRQPXOWLNH\ZRUGUDQNHGVHDUFK,QDGGLWLRQDKLHUDUFKLFDO
FOXVWHULQJPHWKRGLVXVHGWRFOXVWHUWKHGRFXPHQWVEDVHGRQDUHOHYDQFHVFRUH7KHUHLVDOVRDOLPLWRQWKHPD[LPXP
VL]HRIHDFKFOXVWHU,IWKHVL]HRIDFOXVWHUH[FHHGVWKLVOLPLWWKHFOXVWHULVIXUWKHUGLYLGHGLQWRVXEFOXVWHUVXQWLOWKH
VL]H RI HDFK FOXVWHU IDOO EHORZ WKH WKUHVKROG YDOXH 'XULQJ WKH VHDUFK SKDVH WKH V\VWHP LWHUDWLYHO\ GHWHUPLQHV WKH
PRVWUHOHYDQWFOXVWHU2QO\WKRVHGRFXPHQWVLQWKDWFOXVWHUQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFK
WLPH

5HODWHGZRUNV

0DQ\ UHVHDUFKHV KDYH SURSRVHG VHYHUDO PHWKRGV IRU VHDUFK RQ HQFU\SWHG GDWD LQ WKH FORXG 6RPH RI WKHP DQG
WKHLUGUDZEDFNVDUHGLVFXVVHGEHORZ

2.1. Searchable encryption based on single keyword

,QWKHPHWKRGSURSRVHGE\6RQJHWDO>@HDFKZRUGLQWKHGRFXPHQWLVHQFU\SWHGLQGHSHQGHQWO\7KLVUHTXLUHV
VFDQQLQJ RI WKH HQWLUH GDWD FROOHFWLRQ ZRUG E\ ZRUG 7KH PDMRU GUDZEDFN RI WKLV PHWKRG LV WKH KLJK VHDUFK FRVW
UHVXOWLQJIURPWKHVFDQQLQJRIHQWLUHGRFXPHQW&DVKHWDO>@SURSRVHGDV\PPHWULFVHDUFKDEOHHQFU\SWLRQVFKHPH
7KRXJKLWSURYLGHVKLJKHIILFLHQF\IRUODUJHGDWDEDVHVLWODFNVDUDQNPHFKDQLVP,IDODUJHQXPEHURIGRFXPHQWV
FRQWDLQWKHVHDUFKHGNH\ZRUGWKHXVHUKDVWRPDQXDOO\VHOHFWZKDWWKH\DFWXDOO\ZDQWZKLFKLQWXUQLQFUHDVHWKH
RYHUDOOVHDUFKWLPH

2.2. Searchable encryption based on multiple keywords

&DR HW DO >@SURSRVHG DQ DUFKLWHFWXUHZKLFK SHUIRUP PXOWLNH\ZRUG VHDUFK DQG DOVRVXSSRUW UHVXOW UDQNLQJ E\
XVLQJNQHDUHVWQHLJKERUDOJRULWKP+RZHYHUWKHVHDUFKWLPHRIWKLVPHWKRGJURZVH[SRQHQWLDOO\LQUHVSRQVHWRDQ
H[SRQHQWLDOO\ LQFUHDVLQJ VL]H RI WKH GRFXPHQW FROOHFWLRQV 6XQ HW DO >@ SURSRVHG D QHZ DUFKLWHFWXUH 7KRXJK LW
SURYLGHV EHWWHU HIILFLHQF\ WKH UHOHYDQFH EHWZHHQ WKH GRFXPHQWV LV LJQRUHG DQG KHQFH LW GRHV QRW UHWXUQ WKH PRVW
UHOHYDQWUHVXOWV
312 V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317

2.3. Boolean Symmetric Searchable Encryption

7DULN0RDWD]DQG$EGXOODWLI6KLIND>@SURSRVHGDV\VWHPIRUVHDUFKLQJPXOWLSOHNH\ZRUGVRYHUHQFU\SWHGGDWD
XVLQJ%RROHDQ6\PPHWULF6HDUFKDEOH(QFU\SWLRQ %66( ,WXVHV*UDP6FKPLGWSURFHVVWRRSWLPL]HWKHVHDUFK
SURFHVV,WFRQVLGHUVDUELWUDU\ERROHDQH[SUHVVLRQVVXFKDVFRQMXQFWLRQVDQGGLVMXQFWLRQVRINH\ZRUGVDQGWKHLU
FRPSOHPHQWRQNH\ZRUGV

2.4. Fuzzy Keyword Search

7KHDERYHPHQWLRQHGVHDUFKLQJVFKHPHVZLOOUHWULHYHILOHVRQO\EDVHGRQH[DFWPDWFKRIWKHNH\ZRUG$Q\W\SRV
DQGLQFRQVLVWHQFLHVLQWKHIRUPDWZLOOQRWUHWXUQWKHUHTXLUHGGRFXPHQWV-/LHWDO>@SURSRVHGDZLOGFDUGEDVHG
WHFKQLTXH WR FUHDWH HIILFLHQWIX]]\NH\ZRUG VHWV WKDW FDQEHXVHG IRU PDWFKLQJ UHOHYDQW GRFXPHQWV :KHQHYHU WKH
H[DFWPDWFKVHDUFKIDLOVWKHVHDUFKUHVXOWLVSURYLGHGEDVHGRQWKHIX]]\NH\ZRUGGDWDVHW

6\VWHPPRGHODQGSUREOHPIRUPXODWLRQ

7KH SURSRVHG V\VWHP XVHV D YHFWRU VSDFH PRGHO LQ ZKLFK HYHU\ GRFXPHQW LV UHSUHVHQWHG E\ D YHFWRU (YHU\
GRFXPHQWFDQEHVHHQDVDSRLQWLQDKLJKGLPHQVLRQDOVSDFH7KHGRFXPHQWVDUHFODVVLILHGLQWRFDWHJRULHVE\XVLQJD
FOXVWHULQJPHWKRG7KHSURSRVHGV\VWHPXVHVDKLHUDUFKLFDOFOXVWHULQJLQGH[LHDKLHUDUFK\RIFOXVWHUVDWGLIIHUHQW
OHYHOVLVXVHG(DFKFOXVWHUKDVDFRQVWUDLQWRQWKHPLQLPXPUHOHYDQFHVFRUHEHWZHHQWKHGRFXPHQWVLQWKDWFOXVWHU
:KHQDQHZGRFXPHQWLVDGGHGWRWKHFOXVWHUWKHFRQVWUDLQWPD\JHWEURNHQ,QVXFKDFDVHDQHZFOXVWHUFHQWHUZLOO
EHDGGHGWRWKHV\VWHP$IWHUWKDWDOOWKHFOXVWHUFHQWHUVZLOOEHUHVHOHFWHGDQGDOOWKHGRFXPHQWVZLOOEHUHDVVLJQHG
7KHPD[LPXPVL]HRIWKHFOXVWHULVDOVRIL[HGIRUHDFKOHYHO,IWKHVL]HRIDFOXVWHUH[FHHGVWKHPD[LPXPOLPLWWKDW
FOXVWHUZLOOEHGLYLGHGLQWRPXOWLSOHVXEFOXVWHUV:KHQDVHDUFKLVEHLQJSHUIRUPHGRQO\WKRVHGRFXPHQWVLQWKH
UHOHYDQWFOXVWHUVQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFKWLPH
'XULQJWKHVHDUFKSKDVHWKHUHOHYDQFHVFRUHEHWZHHQWKHVHDUFKTXHU\DQGWKHFOXVWHUFHQWHUVRIWKHILUVWOHYHO
LQGH[ LV FRPSXWHG 7KH FOXVWHU FHQWHU ZLWK PD[LPXP UHOHYDQFH VFRUH ZLOO EH VHOHFWHG DQG WKLV SURFHVV ZLOO EH
LWHUDWLYHO\UHSHDWHGIRUWKHFKLOGUHQLQWKHQH[WOHYHOFOXVWHUVXQWLOWKHVPDOOHVWFOXVWHULQWKHORZHVWOHYHOLVIRXQG,I
WKLVFOXVWHUGRHVQRWFRQWDLQWKHGHVLUHGGRFXPHQWWKHV\VWHPZLOOWUDFHEDFNWRWKHSDUHQWRIWKHVPDOOHVWFOXVWHU
7KLVSURFHVVLVUHSHDWHGXQWLOWKHGHVLUHGGRFXPHQWLVIRXQGRUWKHURRWFOXVWHULVUHDFKHG

3.1. System architecture

7KHV\VWHPDUFKLWHFWXUHLVFRPSRVHGRIPDLQO\IRXUHQWLWLHVDVVKRZQLQ)LJ7KH\DUHWKHGDWDRZQHUWKHGDWD
XVHUWKHFORXGVHUYHUDQGWKHFORXGPDQDJHU7KHGDWDRZQHULVWKHPRGXOHUHVSRQVLEOHIRUFROOHFWLQJGRFXPHQWV
SHUIRUPLQJ WKH HQFDSVXODWLRQ EXLOGLQJ WKH GRFXPHQW LQGH[ DQG RXWVRXUFLQJ WKH HQFU\SWHG GRFXPHQW WR WKH FORXG
VHUYHU7KHGDWDXVHULVWKHFRQVXPHURIWKHGRFXPHQWVDQGWKH\PXVWKDYHQHFHVVDU\DXWKRUL]DWLRQEHIRUHDFFHVVLQJ
WKLVGDWD7KHFORXGVHUYHULVWKHHQWLW\ZKLFKSURYLGHVDKXJHVWRUDJHVSDFHDQGQHFHVVDU\FRPSXWDWLRQDOUHVRXUFHV
IRU WKH FLSKHUWH[W VHDUFK7KH FORXG PDQDJHU LV UHVSRQVLEOH IRU HQVXULQJ DFFHVV FRQWURO ,W EORFNV DOOXQDXWKRUL]HG
UHTXHVWVIRUWKHGDWDE\FKHFNLQJWKHSULYDF\VHWWLQJVRIHDFKXVHU:KHQWKHFORXGVHUYHUUHFHLYHVDUHTXHVWIRUD
GRFXPHQWWKLVUHTXHVWLVYHULILHGE\WKHFORXGPDQDJHU8SRQVXFFHVVIXOYHULILFDWLRQWKHFORXGVHUYHUUHWXUQVWKH
UHTXLUHGGRFXPHQWV

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317 313

)LJ6\VWHPDUFKLWHFWXUH

,PSOHPHQWDWLRQGHWDLOV

4.1. MRSE-HCI architecture

7KHSURSRVHGV\VWHPXVHV0XOWLNH\ZRUG5DQNHG6HDUFKRYHU(QFU\SWHGGDWDEDVHGRQ+LHUDUFKLFDO&OXVWHULQJ
,QGH[ 056(+&,  VFKHPH LQ ZKLFK WKH YHFWRU VSDFH PRGHO LV DGRSWHG IURP WKH 0XOWLNH\ZRUG 5DQNHG 6HDUFK
RYHU (QFU\SWHG GDWD 056(  >@ DQG WKH LQGH[LQJ LV EDVHG RQ +LHUDUFKLFDO ,QGH[LQJ 6WUXFWXUH +&,  >@ 7KH
GHWDLOHGGHVFULSWLRQLVDVIROORZV(YHU\GRFXPHQWLVLQGH[HGE\DYHFWRUDQGHDFKGLPHQVLRQRIWKHYHFWRUUHIHUVWR
D NH\ZRUG 7KH YDOXH RI HDFK GLPHQVLRQ LQGLFDWHV ZKHWKHU WKH NH\ZRUG DSSHDUV LQ WKH SDUWLFXODU GRFXPHQW 7KH
TXHU\LVDOVRUHSUHVHQWHGLQDVLPLODUZD\DVDYHFWRU7KHOHQJWKVRIWKHGRFXPHQWYHFWRUVDUHQRUPDOL]HGDQGKHQFH
WKH GLVWDQFH RI SRLQWV LQ WKH QGLPHQVLRQDO VSDFH UHIOHFWV WKH UHOHYDQFH RI FRUUHVSRQGLQJ GRFXPHQWV 'XULQJ WKH
VHDUFKSKDVHWKHFORXGVHUYHUFRPSRQHQWFRPSXWHVWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGWKHGRFXPHQWV
YHFWRU E\ FRPSXWLQJ WKHLU LQQHU SURGXFW :KHQ WKH GRFXPHQWV DUH VWRUHG LQ WKH FORXG LQ DQ HQFU\SWHG IRUP WKH
VHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVZLOOEHORVW+RZHYHUWKHSURSRVHGV\VWHPXVHVDFOXVWHULQJPHWKRG,Q
WKHQGLPHQVLRQDOVSDFHWKHSRLQWVRIKLJKO\UHOHYDQWGRFXPHQWVDUHYHU\FORVHWRHDFKRWKHUWKHUHE\WKHVHPDQWLF
UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVSUHVHUYHG
:KHQWKHYROXPHRIGDWDLQWKHFORXGH[SHULHQFHVDGUDPDWLFJURZWKWKHWUDGLWLRQDOVHDUFK DSSURDFKHVZLOOEH
YHU\LQHIILFLHQWDQGKDVDQH[SRQHQWLDOJURZWK7RLPSURYHWKHVHDUFKHIILFLHQF\DKLHUDUFKLFDOFOXVWHULQJPHWKRGLV
XVHG7KHKLHUDUFKLFDODSSURDFKFOXVWHUVWKHGRFXPHQWVEDVHGRQWKHUHOHYDQFHVFRUHDWGLIIHUHQWOHYHOV:KHQWKH
VL]H RI WKH FOXVWHU UHDFKHV WKH PD[LPXP FOXVWHU VL]H WKUHVKROG WKH V\VWHP SDUWLWLRQV WKH FOXVWHUV LQWR VXEFOXVWHUV
XQWLO WKH FULWHULRQ LV VDWLVILHG :KHQ WKH GRFXPHQWV DUH EHLQJ XSORDGHG WKH GDWD RZQHU DOVR EXLOGV DQ HQFU\SWHG
LQGH[$V\PPHWULFNH\HQFU\SWLRQDOJRULWKPLVXVHGDQGWKHGRFXPHQWVDUHHQFU\SWHGXVLQJVRPHUDQGRPQXPEHUV
DQGDVHFUHWNH\:KHQWKHGDWDXVHUQHHGVDSDUWLFXODUGRFXPHQWDTXHU\LVVXEPLWWHGWRWKHFORXGVHUYHU7KHFORXG
VHUYHUZLOOUHWXUQWKHWDUJHWGRFXPHQWWRWKHGDWDXVHU
7KHIXQFWLRQVRIWKHGLIIHUHQWFRPSRQHQWVDUHGHVFULEHGEHORZ

.H\JHQ7KLVIXQFWLRQZLOOJHQHUDWHWKHVHFUHWNH\XVHGWRHQFU\SWWKHLQGH[DQGWKHGRFXPHQWV)RUWKLVD
ELWYHFWRULQZKLFKHDFKHOHPHQWLVDQLQWHJHURUDQGWZRLQYHUWLEOH PDWULFHV
M1DQGM2ZKRVHHOHPHQWVDUHUDQGRPLQWHJHUVDUHJHQHUDWHG

,QGH[ 7KLVSKDVHJHQHUDWHV WKH HQFU\SWHG LQGH[E\ XVLQJ WKH DERYH JHQHUDWHG VHFUHWNH\ 7KH FOXVWHULQJ SURFHVV
314 V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317

DOVRWDNHVSODFHLQWKLVSKDVH7KHLQGH[DOJRULWKPLVDVIROORZV

  $WRNHQL]HUDQGDSDUVHUWRROVDUHXVHGWRH[WUDFWDOOWKHNH\ZRUGVSUHVHQWLQWKHGRFXPHQW
  7KHGRFXPHQWVDUHWUDQVIRUPHGLQWRDFROOHFWLRQRI'RFXPHQW9HFWRUV '9 
  $ 4XDOLW\ +LHUDUFKLFDO &OXVWHULQJ 4+&  PHWKRG LV XVHG WR JHQHUDWH WKH LQIRUPDWLRQ DERXW 'RFXPHQWV
&ODVVLILFDWLRQ '& DQGWKHFROOHFWLRQRI&OXVWHU&HQWHUV9HFWRUV &&9) 
  7KH GDWD RZQHU SHUIRUPV WKH GLPHQVLRQH[SDQGLQJ DQG YHFWRU VSOLWWLQJ SURFHGXUH RQ HYHU\ GRFXPHQW
YHFWRU
D 'XULQJGLPHQVLRQH[SDQGLQJSURFHGXUHHDFKYHFWRULQ&&9LVH[WHQGHGWR ELWORQJ
YHFWRU ZKHUH WKH YDOXH LQ GLPHQVLRQ LV DQ LQWHJHU QXPEHU JHQHUDWHG UDQGRPO\
DQGWKHODVWGLPHQVLRQLVVHWWR
E 'XULQJWKHYHFWRUVSOLWWLQJSURFHGXUHHYHU\H[WHQGHGGRFXPHQWYHFWRULVVSOLWLQWRWZR
ELWORQJ YHFWRUV DQG  XVLQJ WKH DERYH JHQHUDWHG ELW YHFWRUDV D VSOLWWLQJ
LQGLFDWRU

(QFU\SWLRQ7KHSODLQGRFXPHQWVHW'LVHQFU\SWHGXVLQJDQ\VHFXUHV\PPHWULFHQFU\SWLRQDOJRULWKPVXFKDV$(6
7KHHQFU\SWHGGRFXPHQWLVWKHQRXWVRXUFHGWRWKHFORXG

7UDSGRRU:KHQDXVHUVXEPLWVDTXHU\WKHFORXGPDQDJHUZLOODQDO\VHWKHTXHU\DQGYHULI\WKDWWKHUHTXHVWFRPH
IURP DQ DXWKHQWLFDWHG XVHU7KHNH\ZRUGV LQ WKHTXHU\ DUH DQDO\]HG ZLWK WKHKHOSRIGLFWLRQDU\ ': DQG DTXHU\
YHFWRU49LVJHQHUDWHGZKLFKLVWKHQH[WHQGHGWRD ELWYHFWRU

6HDUFK:KHQWKHFORXGVHUYHUUHFHLYHVWKHTXHU\YHFWRUWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGLQGH[
YHFWRURIFOXVWHUVDUHFRPSXWHGLQDKLHUDUFKLFDOPDQQHU,WILQDOO\FKRVHVWKHFOXVWHUZLWKPD[LPXPUHOHYDQFHVFRUH
DVWKHWDUJHWFOXVWHUDQGVHDUFKIRUWKHUHTXLUHGGRFXPHQW,IWKHGRFXPHQWLVQRWIRXQGLWEDFNWUDFNVDQGFKRRVHD
GLIIHUHQWFOXVWHUZLWKQH[WKLJKHVWVFRUH7KLVSURFHVVLVUHSHDWHGXQWLOWKHWDUJHWGRFXPHQWLVIRXQG

'HFU\SWLRQ7KLVFRPSRQHQWLVXVHGE\WKHGDWDXVHUWRGHFU\SWWKHUHWXUQHGGRFXPHQW7KHVHFUHWNH\LVH[FKDQJHG
WRWKHXVHUWKURXJKDVHFXUHPHFKDQLVP

4.2. Relevance measure

,QWKHSURSRVHGV\VWHPWKHFRQFHSWRIFRRUGLQDWHPDWFKLQJLVXVHGDVDUHOHYDQFHPHDVXUH7KHUHOHYDQFHVFRUH
EHWZHHQGRFXPHQWdiDQGTXHU\ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ

 
 

7KHUHOHYDQFHVFRUHEHWZHHQTXHU\ DQGFOXVWHUFHQWHU LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ

 
 

7KHUHOHYDQFHVFRUHEHWZHHQGRFXPHQW DQG LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ


 
 

V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317 315

4.3. Quality Hierarchical Clustering Algorithm

6RPH RI WKH PRVW ZLGHO\ XVHG DQG SRSXODU FOXVWHULQJ DOJRULWKPV DUH K-means DQG K-medoids ,Q WKHVH
DOJRULWKPVWKHYDOXHRIkLVIL[HGHDUOLHU+RZHYHULQDELJGDWDVFHQDULRLWLVLPSRVVLEOHWRSUHGLFWWKHYDOXHRIN
HDUO\ 7KH FOXVWHUV DUH WR EH JHQHUDWHG G\QDPLFDOO\ +HQFH D dynamic K-means algorithm LV XVHG 7R NHHS WKH
FOXVWHUVGHQVHDQGFRPSDFWDPLQLPXPUHOHYDQFHWKUHVKROGYDOXHLVPDLQWDLQHG:KLOHSHUIRUPLQJWKHFOXVWHULQJ
SURFHVVWKHUHOHYDQFHVFRUHEHWZHHQHDFKGRFXPHQWDQGLWVFOXVWHUFHQWHULVFRPSXWHGDQGLIWKLVYDOXHLVOHVVWKDQ
WKH PLQLPXP WKUHVKROG YDOXH D QHZ FOXVWHU LV DGGHG DQG DOO WKH GRFXPHQWV DUH UHDVVLJQHG DFFRUGLQJO\ 7KLV
SURFHGXUHLVH[HFXWHGLWHUDWLYHO\XQWLODVWDEOHYDOXHRIkLVUHDFKHG

4.4. Search Algorithm

7RVHDUFKIRUDSDUWLFXODUGRFXPHQWWKHFORXGVHUYHUILUVWQHHGVWRILQGWKHFOXVWHUWKDWPRVWPDWFKWKHTXHU\7KH
FORXGVHUYHUXVHVWKHFOXVWHULQGH[ DQGDQLWHUDWLYHSURFHGXUHDVGHVFULEHGEHORZWRILQGWKHWRSPDWFKHGFOXVWHU
  7KHFORXGVHUYHUILUVWFRPSXWHVWKHUHOHYDQFHVFRUHYDOXHEHWZHHQTXHU\ DQGHQFU\SWHGYHFWRUVRIWKH
ILUVWOHYHOFOXVWHUFHQWHUVLQFOXVWHULQGH[ DVGHVFULEHGLQ(TXDWLRQ,WWKHQFKRRVHVWKHiWKFOXVWHUFHQWHU
ZLWKWKHKLJKHVWVFRUH
  )RUHDFKFKLOGFOXVWHUFHQWHUVRIWKHDERYHVHOHFWHGFOXVWHUFHQWHUWKHFORXGVHUYHUFRPSXWHVWKHUHOHYDQFH
VFRUH EHWZHHQ DQG HYHU\ HQFU\SWHG YHFWRUV RI FKLOG FOXVWHU FHQWHUV DQG ILQDOO\ JHWV WKH FOXVWHU FHQWHU
ZLWKWKHWRSVFRUH

7KHDERYHSURFHGXUHLVLWHUDWHGXQWLOWKHXOWLPDWHFOXVWHUFHQWHU LQODVWOHYHOOLVDFKLHYHG

5HVXOWVDQGDQDO\VLV

5.1. Search Efficiency

7KHHIILFLHQF\RIWKHV\VWHPZDVWHVWHGZLWKDWZROHYHOFOXVWHULQJPRGHO7KHQXPEHURIRSHUDWLRQQHHGHGIRU
WKHHQWLUHVHDUFKSURFHVVFDQEHFRPSXWHGDVGHVFULEHGLQ(TXDWLRQ7RLQFUHDVHWKHVHDUFKHIILFLHQF\WKHV\VWHP
XVHVDVWDWLFGLFWLRQDU\RINH\ZRUGVZKLFKGRHVQRWHIIHFWLYHO\FRQWULEXWHWRWKHVHDUFKSURFHVV7KHWHUPVOLNHIRU
DQG HWF LQ WKH VHDUFK TXHU\ ZLOO EH UHPRYHG DQG D PRGLILHG TXHU\ YHFWRU ZLOO EH FRQVWUXFWHG 7KH VXEVHTXHQW
FRPSDULVRQVDUHPDGHRQO\ZLWKWKHPRGLILHGTXHU\YHFWRU/HWxGHQRWHWKHVL]HRIWKHVWDWLFGLFWLRQDU\wGHQRWH
WKHQXPEHURITXHU\NH\ZRUGVuGHQRWHWKHQXPEHURINH\ZRUGVLQWKHPRGLILHGTXHU\YHFWRUnGHQRWHWKHWRWDO
QXPEHURIGRFXPHQWVLQWKHGRFXPHQWVFROOHFWLRQkGHQRWHWKHQXPEHURIFDWHJRULHVLQWKHILUVWOHYHOFOXVWHUDQGt
GHQRWHWKHDYHUDJHQXPEHURIGRFXPHQWVLQWKHVXEVHTXHQWFOXVWHU

 
  

7KHQXPEHURIRSHUDWLRQVUHTXLUHGE\DV\VWHPZLWKRXWDQ\FOXVWHULQJWHFKQLTXHLVGHVFULEHGLQ(TXDWLRQ

 
  

'XULQJ WKH VHDUFK VWHS WKH H[LVWLQJ V\VWHP FRPSDUHV WKH TXHU\ YHFWRU ZLWK WKH HQWLUH GRFXPHQWV FROOHFWLRQ
ZKHUHDV WKH SURSRVHG V\VWHP FRPSDUHV LW RQO\ ZLWK WKH UHOHYDQW FOXVWHU OHDGLQJ WR VLJQLILFDQW UHGXFWLRQ LQ VHDUFK
WLPH

5.2. Performance analysis

7RWHVWWKHSHUIRUPDQFHRIWKHSURSRVHGV\VWHPDQH[SHULPHQWDOVHWXSZDVEXLOWDVIROORZV$QDSSOLFDWLRQ
316 V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317

VLPXODWLQJWKHDFWLYLWLHVRIDXQLYHUVLW\ZDVFUHDWHG7KHFORXGVWRUDJHSODWIRUPIRUWKHV\VWHPZDVSURYLGHGE\WKH
*RRJOHSXEOLFFORXG7KHGDWDRZQHUVRIWKHV\VWHPDUH

 7KHXQLYHUVLW\ZKLFKRZQVWKHPDUNOLVWVDQGFHUWLILFDWHVRIDOOWKHSDVVHGRXWDQGSUHVHQWO\VWXG\LQJ
VWXGHQWV
 7KHFROOHJHZKLFKXSORDGVWKHVHVVLRQDOPDUNVDQGRWKHUVWXGHQWVSHFLILFGRFXPHQWVRIDOOWKHVWXGHQWV

7KHGDWDVHWIRUWKHSHUIRUPDQFHDQDO\VLVZDVEXLOWIURPWKHDERYHPHQWLRQHGW\SHVRIGRFXPHQWV7KHV\VWHP
ZDVWHVWHGZLWKDOLQHDULQFUHDVHLQWKHQXPEHURIGRFXPHQWVDQGWKHFRUUHVSRQGLQJVHDUFKWLPHVZHUHHVWLPDWHG,W
LVHYLGHQWIURP)LJWKDWWKHSURSRVHGV\VWHPRXWSHUIRUPVWKHH[LVWLQJV\VWHPZLWKRXWFOXVWHULQJ7KHV\VWHPZDV
DOVRWHVWHGZLWKDQH[SRQHQWLDOJURZWKLQWKHQXPEHURIGRFXPHQWV)LJVKRZVWKDWWKHSURSRVHGV\VWHPZLWK
FOXVWHULQJKDVDOLQHDUJURZWKLQVHDUFKWLPHZKLOHWKHV\VWHPZLWKRXWFOXVWHULQJKDVDQH[SRQHQWLDOJURZWKLQVHDUFK
WLPH
12000
10000
Searchtime

8000 without
6000 clustering
4000
2000 with
0 hierarchica
10 20 30 40 50 lclustering

Numberofdocuments(x100)

)LJ&RPSDULVRQRIVHDUFKWLPHZLWKDOLQHDUJURZWKLQGRFXPHQWVFROOHFWLRQ

20000

15000
Searchtime

10000 without
clustering
5000 with
clustering
0
148 403 109629808103
Numberofdocuments

)LJ&RPSDULVRQRIVHDUFKWLPHZLWKDQH[SRQHQWLDOJURZWKLQGRFXPHQWVFROOHFWLRQ

5.3. Security analysis

$ GHGLFDWHG PRGXOH FDOOHG FORXG PDQDJHU LV DGGHG WR WKH SURSRVHG V\VWHP WR YHULI\ WKH DXWKHQWLFLW\ RI WKH
DUULYLQJ UHTXHVWV 7R HQVXUH WKH FRQILGHQWLDOLW\ DQG SULYDF\ RI WKH GRFXPHQWV VWRUHG LQ WKH FORXG VHUYHU DOO WKH
GRFXPHQWV DUH HQFU\SWHG XVLQJ D V\PPHWULF HQFU\SWLRQ DOJRULWKP EHIRUH XSORDGLQJ LW WR WKH FORXG ,Q DGGLWLRQ WR
WKDWWKHFORXGVWRUDJHSURYLGHUDOVRSHUIRUPVDWZROHYHOHQFU\SWLRQRQWKHGRFXPHQWVDQGUHWXUQVDSXEOLFNH\WR
WKHFORXGPDQDJHU$OOWKHNH\VDUHPDQDJHGE\WKHFORXGPDQDJHUDQGRQO\SHRSOHZLWKVXIILFLHQWDFFHVVULJKWVFDQ
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 317 317

GHFU\SW WKH GRFXPHQW &RQVHTXHQWO\ WKH V\VWHP HQVXUHV WKDW HYHQ LI DQ LQWUXGHU DFFHVVHV WKH GRFXPHQW GLUHFWO\
IURPWKHFORXGVHUYHUWKH\FDQQRWJHWWKHSODLQWH[WRIWKHGRFXPHQWV

&RQFOXVLRQDQGIXWXUHZRUN

7KHSUREOHPRIVHDUFKLQJDQGVHFXUHO\DFFHVVLQJWKHHQFU\SWHGGDWDLQWKHFORXGLVDQDO\]HG,WLVXQGHUVWRRGWKDW
PDLQWDLQLQJWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVUHGXFHWKHVHDUFKWLPHIRUDGRFXPHQW7KHSURSRVHG
ZRUN LV EDVHG RQ PXOWL NH\ZRUG UDQNHG VHDUFK RYHU HQFU\SWHG GDWD 7KH XVH RI KLHUDUFKLFDO FOXVWHULQJ PHWKRG WR
FOXVWHUWKHGRFXPHQWVSUHVHUYHVWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWV7KHH[SHULPHQWDOUHVXOWVSURYH
WKDWWKHSURSRVHGV\VWHPKDVDOLQHDUJURZWKLQWLPHFRPSOH[LW\ZKHQWKHVL]HRIWKHGRFXPHQWVFROOHFWLRQLQFUHDVHV
H[SRQHQWLDOO\,WDOVRLPSOHPHQWVDGHGLFDWHGPRGXOHQDPHGFORXGPDQJHUWRHQVXUHWKHSULYDF\RIFORXGGDWDE\
JUDQWLQJRQO\OLPLWHGDFFHVVWRWKHGRFXPHQWVFROOHFWLRQWRGLIIHUHQWFODVVHVRIXVHUV$VIXWXUHZRUNPRUHVHFXUH
DOJRULWKPV FDQ EH GHYHORSHG IRU LPSURYLQJ WKH SULYDF\ RI WKH XSORDGHG GRFXPHQWV 0RUH VHFXUH DFFHVV FRQWURO
VFKHPHV VXFK DV '\QDPLF ,QIRUPDWLRQ )ORZ 7UDFNLQJ ',)7  WHFKQLTXHV >@ ZLWK FDSDELOLWLHV WR UHFRJQL]H WKH
DGYDQFHGYXOQHUDELOLWLHVFDQDOVRERRVWXSWKHRYHUDOOSHUIRUPDQFHRIWKHV\VWHP

5HIHUHQFHV

>@ ;LDQ & /X < +  /L =  'HFHPEHU  $GDSWLYH FRPSXWDWLRQ RIIORDGLQJ IRU HQHUJ\ FRQVHUYDWLRQ RQ EDWWHU\SRZHUHG V\VWHPV
,Q3DUDOOHODQG'LVWULEXWHG6\VWHPV,QWHUQDWLRQDO&RQIHUHQFHRQ 9ROSS ,(((
>@/L+'DL<7LDQ/ <DQJ+  ,GHQWLW\EDVHGDXWKHQWLFDWLRQIRUFORXGFRPSXWLQJ,Q&ORXGFRPSXWLQJ SS 6SULQJHU
%HUOLQ+HLGHOEHUJ
>@6XQ::DQJ%&DR1/L0/RX:+RX<7 /L+ 0D\ 3ULYDF\SUHVHUYLQJPXOWLNH\ZRUGWH[WVHDUFKLQWKHFORXG
VXSSRUWLQJ VLPLODULW\EDVHG UDQNLQJ ,Q3URFHHGLQJV RI WKH WK $&0 6,*6$& V\PSRVLXP RQ ,QIRUPDWLRQ FRPSXWHU DQG FRPPXQLFDWLRQV
VHFXULW\ SS $&0
>@:DQJ% <X6 /RX: +RX<7 $SULO  3ULYDF\SUHVHUYLQJ PXOWLNH\ZRUGIX]]\VHDUFKRYHUHQFU\SWHGGDWDLQWKHFORXG
,Q,1)2&203URFHHGLQJV,((( SS ,(((
>@6HEDVWLDQ/5%DEX6 .L]KDNNHWKRWWDP-- )HEUXDU\ &KDOOHQJHVZLWKELJGDWDPLQLQJ$UHYLHZ,Q6RIW&RPSXWLQJDQG
1HWZRUNV6HFXULW\ ,&616 ,QWHUQDWLRQDO&RQIHUHQFHRQ SS ,(((
>@6RQJ';:DJQHU' 3HUULJ$  3UDFWLFDOWHFKQLTXHVIRUVHDUFKHVRQHQFU\SWHGGDWD,Q6HFXULW\DQG3ULYDF\6 3
3URFHHGLQJV,(((6\PSRVLXPRQ SS ,(((
>@&DVK'-DHJHU--DUHFNL6-XWOD&.UDZF]\N+5RVX0& 6WHLQHU0 2FWREHU '\QDPLFVHDUFKDEOHHQFU\SWLRQLQYHU\
ODUJHGDWDEDVHV'DWDVWUXFWXUHVDQGLPSOHPHQWDWLRQ,Q1HWZRUNDQG'LVWULEXWHG6\VWHP6HFXULW\6\PSRVLXP 1'66 
>@&DR1:DQJ&/L05HQ. /RX:  3ULYDF\SUHVHUYLQJPXOWLNH\ZRUGUDQNHGVHDUFKRYHUHQFU\SWHGFORXGGDWD3DUDOOHO
DQG'LVWULEXWHG6\VWHPV,(((7UDQVDFWLRQVRQ  
>@6XQ::DQJ%&DR1/L0/RX:+RX<7 /L+  9HULILDEOHSULYDF\SUHVHUYLQJPXOWLNH\ZRUGWH[WVHDUFKLQWKH
FORXGVXSSRUWLQJVLPLODULW\EDVHGUDQNLQJ3DUDOOHODQG'LVWULEXWHG6\VWHPV,(((7UDQVDFWLRQVRQ  
>@0RDWD]7 6KLNID$ 0D\ %RROHDQV\PPHWULFVHDUFKDEOHHQFU\SWLRQ,Q3URFHHGLQJVRIWKHWK$&06,*6$&V\PSRVLXPRQ
,QIRUPDWLRQFRPSXWHUDQGFRPPXQLFDWLRQVVHFXULW\ SS $&0
>@-/L4:DQJ&:DQJ1&DR.5HQDQG:/RX  )X]]\.H\ZRUG6HDUFKRYHU(QFU\SWHG'DWDLQ&ORXG&RPSXWLQJ3URFRI
,(((,1)2&200LQL&RQIHUHQFH
>@&KHQ&=KX;6KHQ3+X-*XR67DUL= =RPD\D$$Q(IILFLHQW3ULYDF\3UHVHUYLQJ5DQNHG.H\ZRUG6HDUFK0HWKRG
>@'DOWRQ0.R]\UDNLV& =HOGRYLFK1 $XJXVW 1HPHVLV3UHYHQWLQJ$XWKHQWLFDWLRQ $FFHVV&RQWURO9XOQHUDELOLWLHVLQ:HE
$SSOLFDWLRQV,Q86(1,;6HFXULW\6\PSRVLXP SS 

Potrebbero piacerti anche