C que so llnguagens de programao? or que elas exlsLem? Como compuLadores eram programados anLes das llnguagens de programao? A 1orre de 8abel LxlsLem enLre 3.000 e 6.000 llnguas faladas em nosso planeLa. Cerca de 200 ldlomas possuem mals de um mllho de falanLes. Como descrever um ldloma? Cue elemenLos esLo presenLes na descrlo de uma llnguagem? CompuLadores Lambem !"#$%&'() Como e a llnguagem +(,(-( pelos compuLadores? Cue slmbolos ela usa? Cuals palavras? Como serla a gramuca dessa llngua eleLrnlca? vamos falar zero-um-ns? CompuLadores possuem cordas vocals mulLo slmples: ou emlLem som, ou no emlLem L posslvel haver uma llnguagem com apenas dols slmbolos? orque somenLe dols slmbolos? ulaleLos do zero-um-ns P mulLas llnguagens de zeros e uns dlferenLes, asslm como h mulLas llnguagens dlferenLes usando caracLeres launos: lngls, porLugus, espanhol, eLc. Cuem me d exemplos de zero-um-ns dlferenLes? ./0% 1""2 3' "# 40% 4(1,%5 Cada lnsLruo em zero-um-ns possul um nome, chamado "6!"-%, e operandos. lnsLrues mudam o esLado do compuLador. Cue upos de lnsLrues poderlam exlsur? lalar zero-um-ns deve ser fcll, no e? Mas no e no. AnugamenLe programar compuLadores era mulLo dlncll. Cual o problema com zero- um-ns? Alguem al conhece carLes perfurados? Como delxar zero-um-ns mals fcll de usar? L velo a ueusa alavras so mals fcels de lembrar que sequnclas de zeros e uns. or exemplo: qual lnsLruo e mals fcll de ler: mov $1, AL, ou 10110000 01100001? C que esLe programa faz? movl $5, %eax movl $1, %edx .L4: imull %eax, %edx decl %eax testl %eax, $0 jg .L4 C que esLe programa faz? movl $5, %eax movl $1, %edx .L4: imull %eax, %edx decl %eax testl %eax, $0 jg .L4 Coloque 3 em eax Coloque 1 em edx Mulupllque eax por edx e coloque o resulLado em edx SubLrala 1 de eax 1esLe se eax e 0 #7" C MonLador As pessoas falavam (''%)1,8, mas os compuLadores alnda falavam zero-um-ns. Lra preclso um LraduLor. C que um LraduLor desLe upo deverla ser capaz de fazer? A ueusa no fol suclenLe rogramar em (''%)1,8 alnda era dlncll. Cs programadores querlam que os compuLadores fossem capazes de falar llnguas alnda mals parecldas com llnguagens humanas. Cuals foram as prlmelras llnguagens de programao? Cuem foram os pals dessas llnguagens? Surge lorLran 9"0# :(!2;' esLava com pregula de escrever programas em (''%)1,8. l8M 1933/34 rogramar cou umas 20 vezes mals fcll Mas as pessoas alnda esLavam reluLanLes. orque? Lxemplo de programa em lorLran nfact=1 do i=1, 5 nfact = nfact*I enddo movl $5, %eax movl $1, %edx .L4: imull %eax, %edx decl %eax testl %eax, $0 jg .L4 lorLran Assembly Cue novldades surglram com lorLran? L Surge LlS 1938, <(''(!0;'%=' >#'?4;4% "+ /%!0#","@8 rofessor 9"0# <!A(&408. uma noLao slmples, baseada em funes maLemucas. MulLos parnLeses, L llsLas. Lxemplo de rograma em LlS (defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1))))) nfact=1 do i=1, n nfact = nfact*I enddo lorLran LlS L quando, nos anos 70, os sovleucos consegulram as ulumas 300 llnhas do slsLema de mlssels amerlcanos. 8ecurso! ALCCL - um ume de esLrelas reclsava-se de um padro para algorlLmos. um comlL fol formado em 1938. !ohn 8ackus C. A. 8. Poare !ohn McCarLhy, eLc uesse comlL nasceu ALCCL 38. 1alvez a mals lnuenLe llnguagem de programao. ALCCL - exemplo !"#$%$& (&)*$+,&$ lacLorlal(m)- !"#$%$& .- /$%!" !"#$%$& l- l 01 !2 m=1 #3$" 1 $45$ m*lacLorlal(m-1)- lacLorlal := l $"+- vocs [ vlram algo parecldo com lsLo? L CC8CL CC8CL fol felLa para negclos: ConLadores, economlsLas, eLc Como deverla ser uma llnguagem asslm? 1938: CC8CL fol crlada por um comlL. lndusLrla, governo e academla Alnda usada em mulLas companhlas, aLe em 8P! Lxemplo de programas em CC8CL ADD YEARS TO AGE. MULTIPLY PRICE BY QUANTITY GIVING COST. SUBTRACT DISCOUNT FROM COST GIVING FINAL-COST. CuanLas llnguagens de programao exlsLem? Cuals as llnguagens mals populares? CuanLas so? A edlLora C'8ellly dlz que exlsLem 2.300 llnguagens de programao documenLadas. A wlklpedla documenLa 630. LxlsLem mulLas. Mas, porque LanLas? ropslLos dlferenLes lorLran servla para clculos clenucos. Llsp era usada em Leorla da compuLao. CC8CL fol felLa para apllcaes comerclals. Algol e uma llnguagem acadmlca. L as ouLras llnguagens que conhecemos? Cuals so as llnguagens pop? uados reurados de www.tiobe.com !ava: 18.71 C: 16.89 P: 10.39 Coogle code: C, !ava, C++, P CralgsllsL: P, C, SCL Cue ouLras medldas? Alguem al fala !avans? ue acordo com mulLos crlLerlos, !ava e a a llnguagem mals popular. ara que serve !ava? Como essa llnguagem surglu? C que ela Lem de mals? um exemplo de [avans: public class Fact { public static void main(String a[]) { int n = 5; int fact = 1; while (n > 1) { fact *= n; n--; } System.out.println(fact); } } e A, e 8, e 6. C surglu em 1972, e fol, duranLe mulLos anos, a llnguagem de programao mals popular. orque C Lem esLe nome? C que a genLe faz com C? orque C fol Lo popular? Cuals os problemas com C? C Leve grande lnuncla. lalando em C. int main() { int n = 5; int fact = 1; while (n > 1) { fact *= n; n--; } printf("%d\n", fact); } Alguem [ vlu lsLo anLes? C Leve grande lnuncla. int n = 5; int fact = 1; while (n > 1) { fact *= n; n--; } int n = 5; int fact = 1; while (n > 1) { fact *= n; n--; } Figure 1. Web application architecture. effectively under a heavy load of requests. Finally, some runtime techniques [23, 24] require a modied runtime system, which con- stitutes a practical limitation in terms of deployment and upgrading. Static analyses to nd SQLCIVs have also been proposed, but none of them runs without user intervention and can guarantee the absence of SQLCIVs. String analysis-based techniques [3, 20] use formal languages to characterize conservatively the set of values a string variable may assume at runtime. They do not track the source of string values, so they require a specication, in the form of a regular expression, for each query-generating point or hotspot in the program a tedious and error-prone task that few program- mers are willing to do. Static taint analyses [12, 18, 31] track the ow of tainted (i.e., untrusted) values through a program and re- quire that no tainted values ow into hotspots. Because they use a binary classication for data (tainted or untainted), they classify functions as either being santitizers (i.e., all return values are un- tainted) or being security irrelevant. Because the policy that these techniques check is context-agnostic, it cannot guarantee the ab- sence of SQLCIVs without being overly conservative. For exam- ple, if the escape quotes function (which precedes quotes with an escaping character so that they will be interpreted as charac- ter literals and not as string delimiters) is considered a sanitizer, an SQLCIV exists but would not be found in an application that con- structs a query using escaped input to supply an expected numeric value, which need not be delimited by quotes. Additionally, static taint analyses for PHP typically require user assistance to resolve dynamic includes (a construct in which the name of the included le is generated dynamically). 1.2 Our Approach We propose a sound, automated static analysis algorithm to over- come the limitations described above. It is grammar-based; we model string values as context free grammars (CFGs) and string operations as language transducers following Minamide [20]. This string analysis-based approach tracks the effects of string opera- tions and retains the structure of the values that ow into hotspots (i.e., where query construction occurs). If all of each string in the language of a nonterminal comes from a source that can be inu- enced by a user, we label the nonterminal with one of two labels. We assign a direct label if a user can inuence the source di- rectly (as with GET parameters) and a indirect label if a user can inuence the source indirectly (as with data returned by a database query). Such labeling tracks the source of string values. We use a syntax-based denition of SQL injection attacks [25], which re- quires that input from a user be syntactically isolated within a gen- erated query. This policy does not need user-provided specica- tions. Finally, we check policy conformance by rst abstracting the labeled subgrammars out of the generated CFG to nd their con- texts. We then use regular language containment and context free language derivability [28], to check that each subgrammar derives only syntactically isolated expressions. We have implemented this analysis for PHP, and applied it to several real-world web applications. Our tool scales to large code bases it successfully analyzes the largest PHP web application ... 01 isset ($ GET['userid']) ? 02 $userid = $ GET['userid'] : $userid = ''; 03 if ($USER['groupid'] != 1) 04 { 05 // permission denied 06 unp msg($gp permserror); 07 exit; 08 } 09 if ($userid == '') 10 { 11 unp msg($gp invalidrequest); 12 exit; 13 } 14 if (!eregi('[0-9]+', $userid)) 15 { 16 unp msg('You entered an invalid user ID.'); 17 exit; 18 } 19 $getuser = $DB->query("SELECT * FROM `unp user`" 20 ."WHERE userid='$userid'"); 21 if (!$DB->is single row($getuser)) 22 { 23 unp msg('You entered an invalid user ID.'); 24 exit; 25 } ... Figure 2. Example code with an SQLCIV. previously analyzed in the literature (about 100K loc). It discovered many vulnerabilities, some previously unknown and some based on insufcient ltering, and generated few false positives. 2. Overview In order to motivate our analysis, we rst present the policy that denes SQLCIVs, and then give an overview of how our analysis checks web applications against that policy. 2.1 SQL Command Injection Vulnerabilities This section illustrates SQLCIVs and formally denes them. 2.1.1 Example Vulnerability Figure 2 shows a code fragment excerpted from Utopia News Pro, a real-world news management system written in PHP; we will use this code to illustrate the key points of our algorithm. This code authenticates users to perform sensitive operations, such as managing user accounts and editing news sources. Initially, the variable $userid gets assigned data from a GET parameter, which a user can easily set to arbitrary values. The code then performs two checks on the value of $userid before incorporating it into an SQL query. The query should return a single row for a legitimate user, and no rows otherwise. From line 14 it is clear that the programmer intends $userid to be numeric, and from line 20 it is clear that the programmer intends that $userid evaluate to a single value in the SQL query for comparison to the userid column. However, because the regular expression on line 14 lacks anchors (^ and $ for the beginning and end of the string, respectively), any value for $userid that has at least one numeric character will be included into the generated query. If a user sets the GET parameter to 1'; DROP TABLE unp user; --, this code will send to the database the folloing query: SELECT * FROM `unp user` WHERE userid='1'; DROP TABLE unp user; --' A lnLerneL resplra P Alguem aqul [ programou em P? C que esse nome quer dlzer? Como deve ser uma llnguagem para desenvolvlmenLo web? um exemplo de Ps: $id = $_GET[user]; if ($id == '') { echo "Invalid user: $id" } else { $getuser = $DB->query (SELECT * FROM 'table' WHERE id=$id); echo $getuser; } Alguem noLou um pouqulnho de C al? Cual o upo da varlvel $id? CompuLadores falam zero-um-ns, ns falamos llnguagens de programao. quem Lraduz esLas colsas? L como essa Lraduo e felLa? Complladores so ponLes C prlmelro compllador fol, provavelmenLe, o A-0 de B&(!% C"66%& (1949). Llnguagens de programao dlferenLes possuem dlferenLes complladores. Mas o mesmo compllador Lambem pode compllar llnguagens dlferenLes. AnaLomla de um compllador lronL Lnd Cumlzador 8ack Lnd lorLran CC8CL Llsp . A8M x86 owerC . Mqulnas vlrLuals uma mqulna vlrLual e um 0(&-D(&% lmplemenLado em '"ED(&%. orque lsso e lnLeressanLe? Cue llnguagens execuLam em mqulnas vlrLuals? Alnda e necessrlo um LraduLor? As vezes, Ludo e lnLerpreLado um lnLerpreLador no produz cdlgo de mqulna. Ao conLrrlo, ele l o cdlgo do programa fonLe, e lnLerpreLa cada comando enconLrado. Cuals as vanLagens de um lnLerpreLador? Cuals llnguagens so lnLerpreLadas? Ser que h alguma llnguagem que necessarlamenLe Lenha de ser lnLerpreLada? Lssas colsas so eclenLe? lazemos F;'4G3#G?)% Algumas llnguagens so complladas enquanLo esLo sendo lnLerpreLadas. 9($(H!&364, por exemplo. L de onde vem a eclncla? Ser que d para fazer melhor que um compllador Lradlclonal? LxlsLe uma llnguagem de programao mals poderosa" que Lodas as ouLras? Se exlsLe, que llnguagem e essa? Mas como medlr esse poder"? lcll ou ulncll 1. LnconLre a rede de esLradas mals curLa que llga Lodas as cldades de Mlnas Cerals. 2. LnconLre a menor roLa passando por Lodas as cldades, sem repeur. 3. uado um programa I para resolver (2), verlque se a prlmelra colsa que I lmprlme e J"$( K&(. P que sermos humlldes A mqulna de 1urlng e um modelo Leorlco que dene Lodos os problemas que so compuLvels. LsLado, La, lelLor, slmbolos, lnsLrues. Se no h soluo na Mqulna de 1urlng, enLo no Lem [elLo mesmo... Llnguagens 1urlng-CompleLas Se uma llnguagem e equlvalenLe a Mqulna de 1urlng, enLo ela e /;&3#@GA")6,%4(. Cuase Loda L e 1urlng-CompleLa. Mas exlsLem llnguagens que no o so. Algum exemplo? 8raln-fuc* um arran[o mulLo grande, conLendo numeros. ClLo comandos: > move uma poslo para dlrelLa move uma poslo para esquerda + soma um a poslo correnLe (C) - subLral um da C . lmprlme conLeudo da C , l enLrada e armazena na C val para comando aps se C e zero volLa para comando aps se C no e zero. C que esLes programas fazem? [-] ou [ > + < - ] Lssas llnguagens Lodas que a genLe vlu. !ava, P, C, lorLran, CC8CL, Algol, eLc, eLc. elas so mulLo parecldas: varlvels, loops, comandos. Ser que no exlsLe nenhum ouLro paradlgma no? Llnguagens lmperauvas e ueclarauvas Llnguagens lmperauvas: C programa lnsLrul como mudar o esLado da mqulna. varlvels, ,""6', sequnclas de comandos. LfelLos colaLerals. LxlsLe funo que reLorna valores dlferenLes dados parmeLros lguals? Llnguagens declarauvas: C programa descreve uma verdade. Ausncla de efelLos colaLerals. L""6' vla chamada de funes recurslvas. SML C programa e um con[unLo de funes. rogramas so provas por lnduo. rlnclpals esLruLuras de dados so llsLas e Luplas. fun sum [] = 0 | sum (h::t) = h + sum t fun filter [] _ = [] | filter (h::t) f = if (f h) then h :: (filter f t) else (filter f t) Sorung fun leq a b = a <= b fun grt a b = a > b fun filter _ nil = nil | filter f (h::t) = if f h then h :: filter f t else filter f t fun qsort nil = nil | qsort (h::t) = (qsort (filter (grt h) t)) @ [h] @ (qsort (filter (leq h) t)) rolog C programa e um con[unLo de resLrles: Se A e verdade, e A!8 e verdade, enLo 8 e verdade. parent(kim, holly). parent(margaret, kim). parent(margaret, kent). parent(esther, margaret). parent(herbert, margaret). parent(herbert, jean). bisavo(GGP, GGC) :- parent(GGP, GP), parent(GP, P), parent(P, GGC). ancestor(X, Y) :- parent(X, Y). ancestor(X, Y) :- parent(Z, Y), ancestor(X, Z). C que produzlr bisavo(X, Y)? um problema -compleLo sum([],0). sum([Head|Tail],X) :- sum(Tail,TailSum), X is Head + TailSum. subList([], []). subList([H|T], [H|R]) :- subList(T, R). subList([_|T], R) :- subList(T, R). intSum(L, N, S) :- subList(L, S), sumList(S, N). uada uma llsLa L de numeros lnLelros, exlsLe uma subllsLa S cu[a soma se[a ?