Sei sulla pagina 1di 71

B.E.

Project

ONLINEJUDGEWITHSOFTWARECODEQUALITY
ANALYSIS

Submittedby:
SURAJGUPTA(348/CO/11)
TUSSHARSINGH(354/CO/11)
UJJWALRELAN(356/CO/11)

Undertheguidanceof
DR.SHAMPACHAKRAVERTY

DISSERTATIONSUBMITTEDINPARTIALFULFILMENTOFTHEREQUIREMENT
FORTHEDEGREEOFBACHELOROFENGINEERING(COMPUTER
ENGINEERING)

DEPARTMENTOFCOMPUTERENGINEERING
NetajiSubhasofTechnology
UniversityofDelhi

20142015


SELFDECLARATION

TheprojectentitledOnlineJudgewithSoftwareCodeQualityAnalysisisa
bonafideworkcarriedoutbySurajGupta,TussharSinghandUjjwalRelanin
DepartmentofComputerEngineering,NetajiSubhasInstituteofTechnology,
DelhiunderthesupervisionandguidanceofDr.ShampaChakravertyinpartial
fulfilmentoftherequirementfortheDegreeofBachelorofEngineeringinComputer
Engineering,UniversityofDelhifortheyear201415.

Thecontentofthisreportistothebestofourknowledgeandhasntbeenusedfor
anyotheracademicactivity.

Date:

SurajGupta
TussharSingh
UjjwalRelan
(348/CO/11)
(354/CO/11)
(356/CO/11)

CERTIFICATE

ThisistocertifythattheprojectentitledONLINEJUDGEWITHSOFTWARECODE
QUALITYANALYSISisabonafideworkdonebyMr.SurajGupta,Mr.Tusshar
SinghandMs.UjjwalRelan,studentsofeighthsemester,B.E.Computer
Engineering,NetajiSubhasInstituteofTechnology,Delhi.

Thisprojectworkhasbeenpreparedasapartialfulfilmentoftherequirementsfor
theawardofthedegreeofBachelorofEngineeringinComputerEngineering,
UniversityofDelhi,intheacademicyear20142015.

Thisworkhasnotbeenpresentedforanyotheracademicpurposebefore.

Iwishthemluckforalltheirfutureendeavours.

Date: 1June2015

Dr.ShampaChakraverty
Professor&Head,Deptt.ofComputerEngineering
NetajiSubhasInstituteofTechnology

ACKNOWLEDGEMENT

Wewouldliketotakethisopportunitytoacknowledgethesupportofallthosewithout
whomtheprojectwouldnthavebeenpossible.

Wesincerelythankourmentor,Dr.ShampaChakravertyforherguidance,criticism
andencouragementwhichledtothecompletionoftheproject.Theregular
brainstormingsessionswithhergaveusadeepinsightintothetopicandevolved
ourideas.Wethankherforgivingusthisopportunityandsupportingusthroughout.

WewouldalsoliketoexpressourgratitudetothelabassistantsinC
ADLABfor
cooperatingwithusandprovidinguswiththerequiredresources.

Wewishtoexpressheartfeltgratitudetoourcollege,NetajiSubhasInstituteof
Technologyforgivingusthisopportunityforresearchanddevelopment.

Lastly,wewouldalsoliketothankourparentswhosupportedusthroughourthick
andthin.Theirtrustinourcapabilitieshelpedusingivingourbest.

INDEX

SELFDECLARATION......1
CERTIFICATE.......2
ACKNOWLEDGEMENT......3
LISTOFTABLES.....7
LISTOFFIGURES...8
ABSTRACT...10
CHAPTER1:INTRODUCTION.12
1.1
Objective...12
1.2
Motivation..13
1.2.1 ImportanceofanOnlineJudge..13
1.2.2 ImportanceofSourceCodeComments...13
1.3
OrganisationofChapters15

CHAPTER2:LITERATURESURVEY..16
2.1
WebApplication16
2.2
OnlineJudge .16
2.3
RubyonRails....17
2.3.1 WhyRubyonRailsOverOtherLanguages.18
2.3.2 LimitationsofRubyonRails...18
2.4
AdvantagesofC++ontheBackend.20
2.5
PythonasaScriptingLanguage...20
2.6
AmazonWebServices...21
2.6.1 AmazonSimpleStorageService...21
2.6.2 AmazonElasticCloudComputing .22
2.7
PlagiarismDetection...22
2.7.1 DifferentFormsofPlagiarism.22
2.7.1.1Collusion...22
2.7.1.2UnacknowledgedReverseEngineering..23
2.7.1.3UnacknowledgedTranslation...23
2.7.1.4UnacknowledgedCodeGeneration.23
2.7.1.5NoReusewithoutTest..24
2.8 StaticCodeQualityAnalysisandItsTools.24
2.8.1 SomeWellKnownOpenSourceToolsforStatic
CodeAnalysis..25
2.8.1.1GCC..26
2.8.1.2Cppcheck.27
2.8.1.3CPPLINT..28
2.8.2 CommercialTools...29
2.9
SourceCodeCommentsasaMetricofSoftwareQuality30
2.9.1 CommentAnalysis..30
2.10 GITAnOpenSourceVersionControlSystem.32
2.11 WEKAAnOpenSourceDataMiningTool...32

CHAPTER3:PROPOSEDDESIGN....34
3.1
SoftwareRequirementSpecificationfortheOnlineJudge...34
4

3.2

3.1.1 Introduction...34
3.1.1.1Purpose....34
3.1.1.2Scope...34
3.1.1.3Definitions,Acronyms,andAbbreviations..34
3.1.1.4Overview..34
3.1.2 GeneralDescription ...35
3.1.2.1ProductPerspective...35
3.1.2.2ProductFunctions..35
3.1.2.3UserCharacteristics...35
3.1.2.4GeneralConstraintsandFeatures...36
3.1.3 SpecificRequirements....36
3.1.3.1ExternalInterfaceRequirements..36
3.1.3.1.1UserInterfaces.36
3.1.3.1.2HardwareInterfaces....37
3.1.3.1.3SoftwareInterfaces.37
3.1.3.2FunctionalRequirements...37
3.1.3.2.1Usermanagement...37
3.1.3.2.2CodeEvaluation...37
3.1.3.2.3ContestManagement..38
3.1.3.2.4PlagiarismAnalysis .38
3.1.3.2.5StaticCodeAnalysis38
3.1.3.3NonFunctionalRequirements...39
3.1.4 OtherRequirements.39
3.1.4.1Database...39
CommentClassification...39
3.2.1 CommentCategories...39
3.2.2 ExtractingFeaturesforClassification40

CHAPTER4:IMPLEMENTATION.42
4.1
ProgrammingLanguagesandTools.42
4.2
Components..42
4.2.1 ImplementationofInterfaceandCodeshell..42
4.2.1.1InterfacesfortheUser..42
4.2.1.2InterfacesfortheAdmin...47
4.2.2 IntegrationofClientandServer.49
4.2.2.1UsingAWSS3Buckets..49
4.2.2.2UsingPythonScriptsattheBackend...50
4.2.3 ImplementationofCodechecker....52
4.2.4 ImplementationofPlagiarismDetection...56
4.2.5 IntegrationwithCppcheck...58
4.2.6 CommentClassifier..59
4.2.6.1FindingDataset59
4.2.6.2Creating.arff.59
4.2.6.3GoodvsBadCommenting.59
4.2.7 UsingGIT..60

CHAPTER5:OBSERVATIONSANDRESULTS...61
5.1
ResultsofCodechecker..61
5

5.2
5.3
5.4

PlagiarismDetectionResults.64
CppcheckResults....65
ComparisonwithDifferentAlgorithms..66

CHAPTER6:CONCLUSIONANDFUTUREWORK....67
6.1
Conclusion67
6.2
Limitations.67
6.3
FutureScope68

REFERENCES...................70

LISTOFTABLES

CHAPTER3
Table3.1:MACHINELEARNINGFEATURESFORCOMMENTS

CHAPTER4
Table4.1COMMANDSUSEDINCODECHECKER
Table4.2CODESNIPPETUSEDFOREXECUTINGCODE

CHAPTER5
Table5.1COMPARISONOFDIFFERENTMACHINELEARNINGALGORITHMS

LISTOFFIGURES

CHAPTER4
Figure4.1:SCREENSHOTOFLOGINPAGE
Figure4.2:SCREENSHOTOFSIGNUPPAGE
Figure4.3:SCREENSHOTOFHOMEPAGE
Figure4.4:SCREENSHOTOFCONTESTPAGE
Figure4.5:SCREENSHOTOFSUBMITPAGE
Figure4.6:SCREENSHOTOFSUBMISSIONSPAGE
Figure4.7:SCREENSHOTOFVIEWPROFILE
Figure4.8:SCREENSHOTOFEDITPROFILE
Figure4.9:SCREENSHOTOFREQUESTADMINRIGHTSPAGE
Figure4.10:SCREENSHOTOFSTATICCODEQUALITYCHECKOPTION
Figure4.11:SCREENSHOTOFCREATECONTESTPAGE
Figure4.12:SCREENSHOTOFEDITCONTESTPAGE
Figure4.13:SCREENSHOTOFEDITPROBLEMPAGE
Figure4.14:SCREENSHOTOFADDINGADMIN
Figure4.15:SCREENSHOTOFLISTOFS3BUCKETSUSED
Figure4.16:INTERACTIONOFPYTHONSCRIPTSINBACKEND
Figure4.17:ORDEROFSTEPSINCODECHECKER
Figure4.18:ORDEROFEXECUTIONINCODECHECKER
Figure4.19:SOURCECODEFILESUSEDFORPLAGIARISMDETECTION
Figure4.20:SOURCECODEFILESUSEDFORSTATICCODEANALYSISUSING
CPPCHECK
Figure4.21:GITHUBREPOSITORYUSEDFORTHEPROJECT

CHAPTER5
Figure5.1:WHENCODEGETSACCEPTED
Figure5.2:WHENCODEGIVESWRONGANSWER
Figure5.3:WHENCODEGIVESCOMPILATIONERROR
Figure5.4:WHENCODEEXCEEDSTIMELIMIT
8

Figure5.5:WHENCODEGIVESRUNTIMEERROR
Figure5.6:RESULTSAFTERMOSSBASEDPLAGIARISMCHECK
Figure5.7:RESULTSAFTERCPPCHECKANALYSESTHECODE

ABSTRACT

Thedigitalworldrevolvesaroundprogramming.Everythingaroundusinvolvessome
kindofprogrammingtoday.Writingasoftwareisaboutinnovation,creativityand
expressionbutitneedstohavesomequalityattributeslikeunderstandability,reliability,
modifiability,scalability,reusabilityandotherssothatthesoftwareslifecanbe
extendedandthepreviouseffortdoesntgowaste.So,wehavecreatedanonline
judgewhichchecksthealgorithmiccorrectnessalongwithcomplexityanddoes
thecodequalityanalysis,thushelpinginenhancementofonesanalyticaland
programmingskills.W
ehavecheckedcodequalitythroughStaticCodeAnalysisand
Commentclassification.
AsweknowthecostofcorrectionincreasesaswegodowntheSoftwareDevelopment
LifecyclesoStaticCodeAnalysishelpsindetectingerrorsearly.Itisreallyusefulin
maintainingcustomisedcompanystandardsandalso,forenhancingthequalityoflarge
codebases.StaticCodeAnalysisdonebeforedeploymentcanpreventhugefailures.
SourceCodeCommentsalsoplayaveryimportantroleinenhancingthequalityofacodeas
theyincreasetheunderstandability,modifiabilityandreusabilityofthecode.
SourceCodeCommentshelpteammemberstoworkcollaborativelyandifany
developerleavestheorganisation,thecodecanstillbeupdatedandmanagedby
others.
Wehaveimplementedthefollowingmainfunctionalitiesinourproject:Contest
Management,CodeEvaluation,PlagiarismDetection,CommentClassificationand
StaticCodeAnalysis.

10

ContestManagementincludescontestcreation,problemadditionandparticipationby
users.WehaveintegratedthefrontendandthebackendusingAWSS3Bucketwhich
actasanintermediatestorageplatformfortheserverandtheclientside.

Wehavecheckedthealgorithmiccorrectnessofthecodesusingtestcasesand
algorithmiccomplexitybyimposingtimeandmemorylimits.Theuserscodeis
executedagainstsomepredefinedtestcasesbymatchingtheusersoutputwiththe
expectedoutputanditshouldpasswithinthetimeandmemorylimitsetinordertoget
accepted.ThisishowwehavedonetheCodeEvaluation.

WehavecheckedforPlagiarismamongsttheacceptedcodesusingtheconceptof
StanfordsMOSS(MethodofSoftwareSimilarity).ItusesWinnowing,alocalalgorithm
forDocumentFingerprinting.

StaticCodeAnalysislooksforerrorswhichthecompilercantcatchlikeuninitialized
memoryandnullpointers.Wehaveintegratedtheonlinejudgewithcppcheck,awell
knownStaticCodeAnalysistoolforC/C++.

Wehaveclassifiedthecommentsintosixcategoriesonthebasisoftheircontext:
Header,Task,Code,Section,InterfaceandInline.Thisclassificationhelpsin
analysingtheimportancegiventocommentsinthegivencode.
WehaveperformedtheclassificationbyusingtheWekatool.Wecreatedourown
datasetandthenapplieddifferentSupervisedMachineLearningalgorithmsinorder
toidentifythebestclassifier.

11

1.Introduction
Computerprogrammingisacollectionofinstructionsinahumanreadable
programminglanguagethatsolveaparticularproblem.Programminginvolves
activitiessuchasanalysis,developingunderstanding,generatingalgorithms,
verificationofrequirementsofalgorithmsincludingtheircorrectnessandresources
consumption,andimplementation(commonlyreferredtoascoding)ofalgorithmsin
atargetprogramminglanguage.Relatedtasksincludetesting,debugging,and
maintainingthesourcecode,implementationofthebuildsystem,andmanagement
ofderivedartifactssuchasmachinecodeofcomputerprograms.Thesemightbe
consideredpartoftheprogrammingprocess,butoftentheterm"software
development"isusedforthislargerprocesswiththeterm"programming",
"implementation",or"coding"reservedfortheactualwritingofsource
code.Everythingaroundusinvolvessomekindofprogrammingtoday([1]and[2]).
Duetotheincreasingneedofgoodprogramming,theneedofgoodcompetitive
programmingplatformsinalsoincreasing.Theseplatformshelpinenhancingthe
analyticalandproblemsolvingskillsoftheprogrammers.

1.1OBJECTIVE
Followingaretheobjectivesofthisproject:
i.Createanonlinejudgewiththefollowingfunctionalities:
a. Itallowstheadmintomanagecontestsandsetproblems.
b. Itallowsuserstosubmitsolutionsforthecontestproblems.

12

c. Everysubmittedcodeischeckedagainstpredefinedtestcasesandisacceptedonly
ifthegeneratedoutputmatchestheexpectedoutputandthishappenswithinthe
predefinedmemoryandtimelimit.
d. AlltheacceptedcodesarecheckedforPlagiarism.
e. Itprovidesaninterfacewheretheusercanassessthestaticcodequalityofthecode.
ii.Createaclassifierwhichcategorisescommentsonthebasisoftheircontext.

1.2MOTIVATION
1.2.1IMPORTANCEOFANONLINEJUDGE
Programmingskillsarebecomingevermoreimportant,quicklyturningintothecore
competencyforallkindsof21stCenturyworkers.Thatinescapablefactisleading
individualstoseekoutnewwaysoflearningtocode,startupsandnonprofitstofindwaysto
helpthemandbusinessestosearchforinnovativeapproachestofindingthecoderstheyso
desperatelyneed[3].
Competitiveprogrammingisplayingagreatroleinspreadingthisprogrammingculture
amongstthebeginners.
So,themotivationofthisprojectwastocreateanonlinejudgewhichcanhelpaprogrammer
enhancethefollowingcapabilities:
1. ProblemSolvingSkills
2. Optimiseduseofresources
3. EfficientandQuicksolving
4. Writingcodesofgoodquality
5. DebuggingandTestingSkills

1.2.2IMPORTANCEOFSOURCECODECOMMENTS

13

Alongwiththealgorithmiccorrectnessofthecode,itneedtobeunderstandable,
maintainableandreusable.Sourcecodecommentsplayanimportantroleinmakingsucha
code.

Programcommentswithinandbetweenmodulesandproceduresusuallyconveyinformation
abouttheprogram,suchasthefunctionality,designdecisions,assumptions,declarations,
algorithms,natureofinputandoutputdata,andremindernotes.Consideringthatthe
programsourcecodemaybetheonlywayofobtaininginformationaboutaprogram,itis
importantthattheprogrammersshouldaccuratelyrecordusefulinformationaboutthese
facetsoftheprogramandupdatethemasthesystemchanges.Commontypesofcomments
usedareprologuecommentsandinlinecomments.Prologuecommentsprecedeaprogram
ormoduleanddescribegoals.Inlinecomments,withintheprogramcode,describehow
thesegoalsareachieved.
Thecommentsprovideinformationthattheunderstandercanusetobuildamental
representationofthetargetprogram.Forexample,inBrooks'topdownmodel,comments
whichactasbeaconshelptheprogrammernotonlyformhypothesis,butrefinethemto
closerrepresentationsoftheprogram.Thus,theoreticallythereisastrongcasefor
commentingprograms.Theimportanceofcommentsisfurtherstrengthenedbyevidence
thatthelackofgoodcommentsinprogramsconstitutesoneofthemainproblemsthat
programmersencounterwhenmaintainingprograms.Ithastobepointedoutthatcomments
inprogramscanbeusefulonlyiftheyprovideadditionalinformation.Inotherwords,itisthe
qualityofthecommentthatisimportant,notitspresenceorabsence[4].

Inourproject,wehaveclassifiedthecommentsintosixcategories:Header,Task,Section,
Inline,CodeandInterface.Bydeterminingtheproportionofthesecommentsinthecode,the
qualityofthecodecanbedetected.

14

1.3ORGANISATIONOFCHAPTERS
Intheupcomingchapters,wewillprovidemoredetailstoprovideaninsightintothe
workdone.Chapter2talksabouttheliteraturesurveyconductedbefore
implementingeachfunctionality.Inchapter3,weexplainthedetaileddesignofthe
onlinejudgebygivingitssoftwarerequirementspecificationandalso,themethod
usedforcommentcategorisation.Inchapter4,wetalkabouttheimplementation
detailsoftheproject.Chapter5coverstheanalysis,observationsandtheresults.
Finally,inchapter6wepresentthelimitations,futurescopeandtheconclusionof
ourwork.

15

2.LiteratureSurvey
Thischaptertalksindetailaboutthetheoreticalandpracticalconceptsexplored
beforestartingwiththeproject.

2.1WEBAPPLICATION
Awebapplicationisanyapplicationthatusesawebbrowserasaclient.The
applicationcanbeassimpleasamessageboardoraguestsigninbookona
website,orascomplexasawordprocessororaspreadsheet.
Awebapplicationrelievesthedeveloperoftheresponsibilityofbuildingaclientfora
specifictypeofcomputeroraspecificoperatingsystem.Sincetheclientrunsina
webbrowser,theusercouldbeusinganIBMcompatibleoraMac.Theycanbe
runningWindowsXPorWindowsVista.TheycanevenbeusingInternetExploreror
Firefox,thoughsomeapplicationsrequireaspecificwebbrowser.
Webapplicationscommonlyuseacombinationofserversidescript(ASP,PHP,etc)
andclientsidescript(HTML,Javascript,etc.)todeveloptheapplication.The
clientsidescriptdealswiththepresentationoftheinformationwhiletheserverside
scriptdealswithallthehardstufflikestoringandretrievingtheinformation.[5]

2.2ONLINEJUDGE
Onlinejudgesprovideaplatformforcompetitiveprogramming.Codechef,Topcoder,
andCodeforcesaresomeofthewellknownonlinejudges.Thesesiteshavehigh

16

qualityofproblemsandalsoallowyoutoseeotherscodepostcontestcompletion.
Thesealsocategorizeproblemsbasedonthetopic.[6]Theseonlinejudgeshelpone
inlikeacingalanguage(yourprimarytoolforcoding),designskills(patterns/oops
et.al),writingreadable/maintainablecode,debuggingtechniques,system
knowledge,anddomainspecificknowledge.

2.3RUBYONRAILS
Rubyisalanguageofcarefulbalance.Itscreator,YukihiroMatzMatsumoto,
blendedpartsofhisfavoritelanguages(Perl,Smalltalk,Eiffel,Ada,andLisp)toform
anewlanguagethatbalancedfunctionalprogrammingwithimperativeprogramming.
Rubyisanobjectorientedlanguage.[8]
RailsisawebapplicationdevelopmentframeworkwrittenintheRubylanguage.Itis
designedtomakeprogrammingwebapplicationseasierbymakingassumptions
aboutwhateverydeveloperneedstogetstarted.Itallowsyoutowritelesscode
whileaccomplishingmorethanmanyotherlanguagesandframeworks.Havinga
standardframework,itallowsfastdevelopment.

Railsisopinionatedsoftware.Railsthinksthatthereisabestwaytodothingsand
inordertogetthemostofofRails,oneshouldfollowthatmethod.

TheRailsphilosophyincludestwomajorguidingsoftwareengineeringprinciples:

Don'tRepeatYourself:DRYisaprincipleofsoftwaredevelopment
whichstatesthat"Everypieceofknowledgemusthaveasingle,
unambiguous,authoritativerepresentationwithinasystem."Bynot

17

writingthesameinformationoverandoveragain,ourcodeismore
maintainable,moreextensible,andlessbuggy.
ConventionOverConfiguration:Railshasopinionsaboutthebest
waytodomanythingsinawebapplication,anddefaultstothissetof
conventions,ratherthanrequirethatyouspecifyeveryminutiae
throughendlessconfigurationfiles.[9]

2.3.1WHYRUBYONRAILSOVEROTHERLANGUAGES?

RoRispreferredoverotherlanguagesasitprovides:

ItprovidesastandardframeworksodesigningaWebApplicationusingRubyon
railsiseasier.

Ithasaclearcodestructurewhichallowsfasterdevelopment.

Itfollowsdesignpatternswhichencouragelesscoderedundancy.

Itallowscodereusabilitythroughtheuseofgems.

RoRallowsfasterdesignanddevelopment.

Rubycodeisveryreadableandselfdocumenting.

Railsandmostofitslibrariesareopensourceunlikeotherstandardframeworks,
thussavingonthelicensecost.

2.3.2LIMITATIONSOFRUBYONRAILS

NotallwebsitehostscansupportRails.
While it is true that not all web hosts support Rails, this is primarily becauseitcanbe
more resource intensive than PHP, a fact which deters lowend sharedhosting

18

providers. However, Railsfriendly hosts do exist, for example, Heroku and


EngineYard.

Alternatively,youcanhostyourRailsapplicationonaVirtualPrivateServer(VPS)with
Amazon EC2, Rackspace, or Linode. You will then have full control over the server
andcanallocatesufficientresourcesforyourapplication.

Java and PHP are more widely used, and there are more developers in these
languages
The number of Ruby developers is growing year on year as more people switch to it
from other programming languages. One of the main differences between the Ruby
and other communities is the amount of open source code (gems) which is publicly
available, as of writing there are 63,711 gems which you can use to enhance your
application.

PerformanceandScalability
There have been concerns that Rails applications are not as fast as JavaorC,which
is true, but for the majority of applications it is fast enough. There are plenty of high
profile organisations which rely on Rails to power their sites including AirBnB, Yellow
Pages,Groupon,Channel5,andGov.uk.

There is also the option of running your application under JRuby and then you have
thesameperformancecharacteristicsasJava.[10]

19

2.4ADVANTAGESOFC++ONTHEBACKEND
C++isageneralpurposeprogramminglanguage.Itsupportsproceduralaswell
asobjectorientedparadigm.ItiseasytolearnasitisquitesimilartoC.

UsingC++onthebackendcanbereallyadvantageousasitisfasterthanthe
scriptinglanguages,itgivesoptimisedperformanceintermsofmemoryand
CPU,anditisintegrablewithotherscriptinglanguages.Itrequiresrelativelyless
memoryspaceandisclosertolowlevellanguageswhichmakesitextremely
fast.

2.5PYTHONASASCRIPTINGLANGUAGE
ThefollowingpointshighlightwhyPythoncanbeconsideredasabeneficial
replacementforbashscripting:
PythonisinstalledbydefaultonallthemajorLinuxdistributions.Openinga
commandlineandtypingpythonimmediatelywilldropyouintoaPython
interpreter.Pythonhasaveryeasytoreadandunderstandsyntax.Itsstyle
emphasizesminimalismandcleancodewhileallowingthedevelopertowrite
inabarebonesstylethatsuitsshellscripting.
Pythonisaninterpretedlanguage,meaningthereisnocompilestage.This
makesPythonanideallanguageforscripting.PythonalsocomeswithaRead
EvalPrintLoop,whichallowsyoutotryoutnewcodequicklyinaninterpreted
way.Thisletsthedevelopertinkerwithideaswithouthavingtowritethefull
programoutintoafile.

20

Pythonisafullyfeaturedprogramminglanguage.Codereuseissimple,
becausePythonmoduleseasilycanbeimportedandusedinanyPython
script.Scriptseasilycanbeextendedorbuiltupon.
Pythonhasaccesstoanexcellentstandardlibraryandthousandsof
thirdpartylibrariesforallsortsofadvancedutilities,suchasparsersand
requestlibraries.Forinstance,Python'sstandardlibraryincludesdatetime
librariesthatallowyoutoparsedatesintoanyformatthatyouspecifyand
compareittootherdateseasily.
Pythoncanbeasimplelinkinthechain.Pythonshouldnotreplaceallthe
bashcommands.ItisaspowerfultowritePythonprogramsthatbehaveina
UNIXfashion(thatis,readinstandardinputandwritetostandardoutput)asit
istowritePythonreplacementsforexistingshellcommands,suchascatand
sort.[11]

2.6AMAZONWEBSERVICES
AmazonWebServicesisacollectionofserviceswhichconstitutethecloud
computingplatformprovidedbyamazon.com.Thetwomostusedwebservices
providedbyAWSare:
2.6.1AMAZONSIMPLESTORAGESERVICE
AmazonSimpleStorageService(AmazonS3),providesdevelopersandITteamswith
secure,durable,highlyscalableobjectstorage.AmazonS3iseasytouse,withasimple
webservicesinterfacetostoreandretrieveanyamountofdatafromanywhereontheweb.
AmazonS3providescosteffectiveobjectstorageforawidevarietyofusecasesincluding

21

cloudapplications,contentdistribution,backupandarchiving,disasterrecovery,andbigdata
analytics[13].

2.6.2AMAZONELASTICCLOUDCOMPUTING
AmazonElasticComputeCloud(AmazonEC2)isawebservicethatprovidesresizable
computecapacityinthecloud.Itisdesignedtomakewebscalecloudcomputingeasierfor
developers.AmazonEC2ssimplewebserviceinterfaceallowsustoobtainandconfigure
capacitywithminimalfriction.Itprovidescompletecontrolofcomputingresourcesandruns
onAmazonsprovencomputingenvironment.AmazonEC2providesdevelopersthetoolsto
buildfailureresilientapplicationsandisolatethemselvesfromcommonfailurescenarios[14].

2.7PLAGIARISMDETECTION
2.7.1DIFFERENTFORMSOFPLAGIARISM
Allformsofplagiarisminvolvesclaimingotherpeople'sworkasyourown(orassisting
someonetomakesuchfalseclaims).Insoftwareengineering,workthatisreusedwithout
properacknowledgementcanbehardtoidentify.Followingarethekindsofsoftware
plagiarismcommonlyfound:
2.7.1.1Collusion
Inallpracticalprojects,itisconsiderednormalpracticetobegivenhelp.Thishelpmustbe
publicallyacknowledgedwhentheworkispresentedforevaluationorpublication.Whenthe
helpissignificantthenitisnormalforthepersonwhohasgiventhehelptobecreditedina
moreformalway.Wherehelphasbeengiven,andthereiscollusionbetweentheparties
involved,itisasimplematterfornopublicacknowledgementofthistobemade.Insucha
case,thereisnodirectreuseofsoftwareintheclassicalengineeringsense.
However,thiscollusionisplagiarism.Softwareofanyreasonablecomplexityis

22

structuredandhasdifferentcomponents.Collusioninsoftwaredevelopmentinvolvesathird
partywritingthecodeforatleastoneofthesecomponents,andastudentsubmittingthis
codeastheirown.
2.7.1.2UnacknowledgedReverseEngineering
Oftensoftwareengineerswilllookatsomecodeandbeabletoreverseengineersome
abstractpropertyofthatcodeinordertoreusethatabstractiontohelpthemwritetheirown
code,usuallyasasolutiontoadifferent,yetsimilar,problem.Whentheoriginalpieceof
codeisnotacknowledgedthenthisisalsocommonlyknownasstealingsomeoneelses
idea(s).Infinalyearprojects,thistypeofplagiarismoftenresultswhenastudentreuses
thedesignofasoftwaresystem(orpartofasoftwaresystem)asastructure,templateor
patternfortheirowncode.Studentsshouldnotbediscouragedfromengineeringsoftwarein
thisway(itisareasonablyadvancedtechnique)buttheyshouldbestronglyencouragedto
correctlyacknowledgewheretheoriginaldesignoriginated.
2.7.1.3UnacknowledgedTranslation
Suppose,thestudentfindstheJavacodetosolveaproblemandreusesittogeneratea
C++code.Thiscanbethoughtofasaspecificformofreusethroughabstraction.Again,this
maybeconsideredagoodapproachinsomecircumstances,providedtheoriginalcodeis
properlyacknowledged.
2.7.1.4UnacknowledgedCodeGeneration
Softwareengineeringtools,oftenfoundaspartofcomplexdevelopmentenvironments,can
beusedtoautomaticallygeneratecode.Anysuchgeneratedcodemustbeexplicitly
identifiedandcorrectlyacknowledged.Notethatthesetoolsusuallycreditthemselves,so
removingthesecreditswouldbeconsideredasdeliberatedeceptiononthepartofthe
student,anddisciplinaryactionwouldfollow.Forexample,therearetoolstogenerateC++
codefromdataflowdiagrams.Thistypeofautomatedsoftwareengineeringisgood,
providedtheroleofthetoolisproperlyacknowledged.

23

2.7.1.5NoReuseWithoutTest
Fromtheexamplesabove,itseemsthatcareneedstobetakenaboutacknowledgingany
reuseofcode.Thereisasimpleguidelinetoensurethatastudentneverforgetsthe
acknowledgement,avoidingtheriskofbeingaccusedofdeliberatedeceptionwhenthe
plagiarismisaresultofincompetency:Explicitlyacknowledgetheuseofsomeoneelses
codenomatterhowsmallbytestingitagainstyourrequirements.Inthecasethata
studentdoesnotproperlytestthesoftwarethattheyarereusing,thisstudentshouldbe
advisedthatthereuseisunacceptable.[15]

2.7.2MEASUREOFSOFTWARESIMILARITY[16]
Moss(foraMeasureOfSoftwareSimilarity)isanautomaticsystemfordeterminingthe
similarityofprograms.Todate,themainapplicationofMosshasbeenindetecting
plagiarisminprogrammingclasses.Sinceitsdevelopmentin1994,Mosshasbeenvery
effectiveinthisrole.Thealgorithmbehindmossisasignificantimprovementoverother
cheatingdetectionalgorithms.
ItusesthelocalWinnowingAlgorithmonkgrams:
a. Divideadocumentintokgrams,wherekisaparameterchosenbytheuserand
where,akgramisacontiguoussubstringoflengthk.
b. Nowhasheachkgramandselectsomesubsetofthesehashestobethe
documentsfingerprints.Inallpracticalapproaches,thesetoffingerprintsisasmall
subsetofthesetofallkgramhashes.Afingerprintalsocontainspositional
information.
c. Givenasetofdocuments,wefindsubstringmatchesbetweenthemthatsatisfytwo
properties:
1.Ifthereisasubstringmatchatleastaslongastheguaranteethreshold,t,thenthis
matchisdetected,and
2.Wedonotdetectanymatchesshorterthanthenoisethreshold,k.

24

Theconstantstandktarechosenbytheuser.
kshouldbelargeenough(tofindsignificantmatches)butnottoolargeotherwise
relocationofstringswouldchangetheresult.
d. Ineachwindowselecttheminimumhashvalue.Ifthereismorethanonehashwith
theminimumvalue,selecttherightmostoccurrence.Nowsaveallselectedhashes
asthefingerprintsofthedocument.
e. Comparethefingerprintsofthedocumentstodeterminethepercentageofsimilarity
betweenthetwo.

MOSSisapreferablemethodforPlagiarismDetectionbecause:
1. Itensurespositionalindependence,noisesuppressionandwhitespacesensitivity.
2. IthasalreadybeenimplementedformultiplelanguageslikeC,C++,Javaetc.
3. Itisefficient.
4. Ithasbeendesignedwithaspecialfocusoncomputerprogrammingcoderather
thantext.[17]
5. Itexcludestemplatecodeandkeywords,thusreducingthenumberoffalsepositives.
MOSShasalimitationthatitcanbebrokenwithchangeinthestructureifthereareseveral
waysofwritingthesamelogic.

2.8STATICCODEQUALITYANALYSISANDITSTOOLS
Staticcodeanalysistechniquesandtools[19]arewidelyspreadandintensively
usedinthedevelopmentofsoftwaresystems.Theirmainbenefitliesin
improvingthequalityofcodeinearlydevelopmentstages.Thus,staticcodeanalysis
techniquesandtoolsarenumerousforestablishedprogramminglanguageslike
C/C++,Java,C#andmanyothers[18].

25

Staticcodeanalysisworksbyanalyzingthestaticstructureandtheelementsofa
programwithoutactuallyexecutingit.Itisthereforeusuallybasedonthesource
codeofaprogramoranintermediaterepresentationthereof.Nowadaysstaticcode
analysishasevolvedtoaselfcontainedprogramqualityassessmentand
improvementtechnology,whichisemployedindependentlyfromcompilers.It
complementsprograminspectionaswellassoftwaretestingtechniquesbyproviding
meanstorevealbadcodesmells[21],violationsofprogrammingconventionsand
guidelines,andpotentialdefects.Themainbenefit,indistinctiontoinspectionand
testingtechniques,isthatstaticcodeanalysisworksfullyautomatedwithoutuser
involvementanddoesnotrequireelaboratetestsettings.
Thetechniquesemployedforstaticcodeanalysisaremanifold(seeforexample[19]
).Theyincludeelementaryrulebasedapproaches,whichsearchforpatternsin
programsthatrepresentknownproblemsandpossibledefects,controlflowanalysis,
whichallowscheckingthepossiblebranchesofaprogram[19],elaboratedataflow
analysistechniquestorevealthedatadependenciesinaprogram[20],aswellas
abstractinterpretation[22],whereaprogramisexecutedwithsymbolicvaluesto
showcertainruntimeproperties.Staticcodeanalysisprovidesaneffectivewayfor
detectingcriticaldefectsearlyinthesoftwarelifecycle.
Staticcodequalitycontributeswidelytogoodsoftwarequality.
2.8.1SOMEWELLKNOWNOPENSOURCETOOLSFORSTATICCODE
ANALYSIS[23]

2.8.1.1GCC
TheGNUCompilerCollectionprovidesasuiteofseveralcompilersfordifferent
programminglanguages.ItsCandC++compilersarewidelyusedintheopensource
communityandtheycomeinstalledbydefaultinpracticallyeveryLinuxdistribution.This

26

benefitdoescomewiththedownsideofalackoffullPOSIXcompatibility.Thesecompilers
comewithseveraldocumentedflagsforreportingpotentialissuesinsourcecode.For
example,thewformatflagenablesformatstringverificationforprintf,scanf,etc.Another
warning,whichisactuallyenabledbydefault,isthecompiletimedivisionbyzerocheck.The
compileralsosupportssourcecodeannotationsintheformofcustomattributes.Onevery
interestingattributeisthenonnullattribute,whichcanbeusedonfunctionparameters.
Functionswherethesenonnullparameterexistarecheckedbythecompilerincasethey
arecalledwithnullarguments.Oneofthemoreinterestingcompilerflagsforanalysis
purposesisthefsyntaxonlyflag.Withthisflagthecompilerisinstructedtoonlycheckfor
syntaxerrors.Inadditiontothis,thecompileralsoreportswarningswiththisflagenabled.
Thisisveryusefulforanalysisasthecompilercanthenbeusedasapurestaticanalyzer
withoutcodegeneration.Unfortunately,GCCdoesnotinstantiateC++templateswhenthis
flagisenabled,thusmakingitlessthanidealasastaticanalyzer.
2.8.1.2CPPCHECK
OriginallycreatedbyDanielMarjamki,Cppcheckisanopensourcestaticanalysistoolthat
hasbeenpartofseveralbenchmarks[14,24,40,59].Thesoftwarecomesinbotha
commandlineversionaswellasagraphicalversion.Bothversionsusethesameanalysis
engineandthushaveidenticalchecks,althoughthecommandlineversionismoresuitable
forautomatedanalysisduetoitbeingeasiertoruninnoninteractiveenvironments.The
analyzerbuildsasimplifiedabstractsyntaxtree(AST)usingitsowncustomparserand
lexicalanalysis,andmaythereforenotalwaysconformtothelatestC++standarddueto
bugsorlackoffeatures.Tosupportmoreelaboratechecks,thelatestversionimplementsa
genericdataflowThisnewframeworksupportsgeneralpurposecontextsensitive
interproceduraldataflowanalysisandcanbeusedbyindividualcheckers.Manycheckers
havebeenmodifiedtousethisnewsystem,althoughsomeoldcheckersstillusetheirown

27

specificdataflowtracking.Theframeworkalsoperformsabstractinterpretationwhen
trackingvaluesinloops.
TherearetwoverydifferentextensionmechanismsinCppcheckforcreatingcustomchecks.
OnewaytocreatethemisbymodifyingthesourcecodetocreatenewC++classes
containingthedesiredchecks.TheseclassesworkasvisitorsonthetokenizedASTstream.
Inadditiontothismethod,customcheckscanalsobecreatedbyspecifyingregular
expressioninanXMLformattedconfigurationfile.Theregularexpressionsareusedfind
defectsbymatchingthemagainstthetokenstream.Althoughconvenient,theseregular
expressionsdonotprovidethesamecapabilitiesasthesourcemodifyingmethod.The
analyzercomeswithmanydifferentcheckers.Thesecheckersarecategorizedbytheir
severity.Programoptionsareavailableforenablingordisablingchecksbytheirseverities.
Theseseveritiesare
errorformoresevereissueslikesyntaxerrors,
warningforsuggestionsaboutpossibleproblems,
stylefordeadcodeandotherstylisticissues,
performanceforsomecommonperformancerelatedsuggestions,
portabilityfor64bitandgeneralplatformportabilityissues,and
informationforinformationalmessagesaboutproblemswiththeanalysis.
Cppcheckisalsoabletogiveevenmorewarningsifenabledwiththeinconclusiveflag.
Thisflagenables,asthenamesuggests,inconclusivecheckswheretheanalyzermightnot
becompletelysureabouttheexistenceofaproblem.Byenablingthisflag,somepreviously
hiddenproblemsmightbedetectedwhilesignificantlyincreasingthenumberoffalse
positives.
2.8.1.3CPPLINT
OriginatingfromGoogle,Cpplint19isacodeanalysisandstylecheckingprogramthat
enforcestheGoogleC++StyleGuide20.Thisprogram,writtenusesamixtureofregular

28

expressionmatchingandotherlinebasedheuristicstodetectvariousproblemsinthe
analyzedcode.Itmostlydetectsstyleissues,whitespaceirregularities,andvariousother
potentialproblems.Duetoitssimplicity,itanalyzesevenlargeprojectsrelativelyfast.Asitis
astyleconventioncheckingtoolcreatedforGoogleprojects,likeChromium21,itsuseis
somewhatlimitedforgeneralpurposeanalysis.Fortunately,irrelevantcheckscanbe
disabledthroughtheuseofcommandlineflags.Inadditiontoindividualchecks,whole
categoriesofcheckscanbeenabledordisabled.Thecurrentlyexistingcategoriesare
buildforbuildandpreprocessingissues,
legalformissingcopyrightmessages,
readabilityforcorrectbutunreadablecode,
runtimeforruntimerelatedissues,and
whitespaceforwhitespaceusageconventions

2.8.2COMMERCIALTOOLS
Severalcommercialstaticanalysistoolshavebeenmadesincethecreationofthefirst
linttypetools.Thesecommercialtoolsaretypicallyeithersmallerstandalonetoolsorparts
ofalargersuiteoftools.Withthesignificantcostsofsomeofthesetoolscomesuseful
benefitslikeextensiveonsitesupport,toolcustomization,andprojectspecifictuningforthe
customer.Someofthewellknowncommercialtoolsare:

KlocworkInsight[23]

CoveritySAVE[23]

LLBMC[23]

2.9SOURCECODECOMMENTSASAMETRICOFSOFTWARE
QUALITY

29

Here,wetrytoemphasizethequalityofsourcecodecommentsasametricof
softwarequality.
2.9.1COMMENTANALYSIS
JiangandHassan[26]studytheevolutionofcommentsovertimeinthePostgreSQLproject.
Theyclaimthatdeveloperscommonlychangecodewithoutupdatingitsassociated
commentsandthatuncommentedinterfacesorinterfaceswithoutdatedcommentsarelikely
tocausebugs.Fortheirstudy,theauthorsprovideacoarsecategorizationofcommentsinto
headerandnonheadercomments.Headercommentsarewrittenpriortoafunction
definition,whereasallothercommentsinsideafunctionbodyortrailingthefunctionare
nonheadercomments.Theauthorsmonitorthepercentageoffunctionswithheader
comments,assumingthatadropovertimeindicatesthatdevelopersarenotupdatingthe
interfacedocumentation.However,thestudyrevealsaconstantpercentageofcommented
functionsexceptforearlyfluctuationduetothecommentingstyleofaparticularactive
developer.
Someauthorsalsoevaluatetheratiobetweencommentsandsourcecodeovertimetogive
atrendanalysiswhetherdevelopersincreaseordecreasetheireffortoncodecommenting.
However,theresultsdifferedgreatlyamongthedifferenttestcasessuchthatnounique
answercanbegiven.Inparticular,theauthorsdidnotdifferentiatebetweendifferenttypesof
comments,e.g.,linesofcommentedoutcodewerealsocountedintheratiobetween
commentsandsourcecode.Wehighlysuspectthatthisleadstoausefulmetrictoassess
theeffortofcodecommenting.Commentedoutcodeshouldbeexcludedinthismetric
becauseitdoesnotprovideanyinformationgainforsystemunderstanding.
Inpreviouswork,maintenanceproductivityhasbeenstudiedextensively.However,the
qualityofcodecommentsplaysonlyaminorroleinassessingasoftwaresystems
maintainability.Itisacommonlyacceptedfactthatpoordocumentationconstitutesamajor
problemaffectingsoftwaremaintainability[4].Assoftwareisoftenmaintainedbypeoplewho

30

didnotdevelopit,poordocumentationcancauseavarietyofeffectsrangingfroman
increaseintimetounderstandandmaintainasoftwaretoacompleteredesignandrebuildof
thesystem.Inaworstcasescenario,itiseasiertorebuildasystemcompletelythanto
understandandmodifyanexistingone.
In[28],Garciaetal.analyzecostsandbenefitsofmaintainability.Thereby,maintainabilityis
relatedtounderstandability,modifiability,andtestability.Inordertomeasurethoseconcepts,
theauthorsapplyseveralmetrics,amongthemlinesofsourcecodeincludingcomments,
linesofcomments,linesofeasymodification,andlinesoferrordetection.Thenumberof
linesofcommentsistherebyconsideredtobeanunderstandabilitymeasure.Ontheone
hand,thisindicatestheimportanceofcodecommentsforsoftwaremaintenance.
Ontheotherhand,thenumberoflinesofcomments(LC)cannotbeasufficientmetricto
assessunderstandability:Commentedoutcodeorcopyrights,forexample,increasethe
numberoflinesofcommentswithoutcontributingtounderstandability.Hence,analyzing
maintainabilitywithmetricsrequiresamoredetailedanalysisofcodecommentquality.
Inasimilarway,theauthorsof[29]usethenumberoflinesofcommentstocalculatemetrics
assessingasoftwaresystemsmaintainability:Forsystemcommentingcharacteristics,they
definetheoverallprogramcommentingratioasa2tuple,containingthepercentageof
commentlinesinthewholeprogramandthepercentageofmoduleswithheadercomments.
Forcomponentcommentingtheymeasureintramodulecommentingasthenumberoflines
withcommentsdividedbythetotalnumberoflinesinthemodule,averagedoverall
modules.However,again,onlymeasuringthenumberofcommentlinescannotbesufficient
astheselinescancontainanyarbitrarycontentnotcontributingtounderstandability.Both
theworkof[28]and[29]lackadifferentiationbetweencommentswithdifferentpurposes.
Toovercomethisproblem,detailedcommentcategorisationwasproposedinorderto
analysehowcommentsaffectunderstandabilityofthecode[25].

31

2.10GITANOPENSOURCEVERSIONCONTROLSYSTEM
Gitisafreeandopensourcedistributedversioncontrolsystemdesignedtohandle
everythingfromsmalltoverylargeprojectswithspeedandefficiency.
Gitiseasytolearnandhasatinyfootprintwithlightningfastperformance.It
outclassesSCMtoolslikeSubversion,CVS,Perforce,andClearCasewithfeatures
likecheaplocalbranching,convenientstagingareas,andmultipleworkflows.
Thus,Gitmakescollaborativeworkveryconvenientandsimplified.

2.11WEKAANOPENSOURCEDATAMININGTOOL[24]
Wekaisacollectionofmachinelearningalgorithmsfordataminingtasks.The
algorithmscaneitherbeapplieddirectlytoadatasetorcalledfromyourownJava
code.Wekacontainstoolsfordatapreprocessing,classification,regression,
clustering,associationrules,andvisualization.Itisalsowellsuitedfordeveloping
newmachinelearningschemes.
WekaisopensourcesoftwareissuedundertheGNUGeneralPublicLicense.
Ithasthefollowingadvantages:
1.Aswekaisfullyimplementedinjavaprograminglanguages,itisplatform
independent&portable.
2.ItisfreelyavailableunderGNUGeneralPublicLicense.
3.Wekas/whasauserfriendlygraphicalinterface,sothesystemisveryeasyto
use.
4.Ithasaverylargecollectionofdifferentdataminingalgorithms.

32

3.ProposedDesign
Thecodeevaluation,plagiarismdetectionandstaticcodeanalysishasbeendone
forC++only.

33

3.1SOFTWAREREQUIREMENTSPECIFICATIONFORTHEONLINE
JUDGE

3.1.1Introduction

3.1.1.1Purpose
Thepurposeofthisdocumentistopresentadetaileddescriptionoftheonlinejudge,Itwill
explainthepurposeandfeaturesofthesystem,theinterfacesofthesystem,thealgorithms
usedinimplementation,functionalityofthesystem,andtheconstraintsunderwhichitmust
operate.Theapplicationisusedtoplatformtomanagecontests.
3.1.1.2Scope
Thisspecificationgivesthedetailsoftheonlinejudge.Itexplainshowuserscansubmittheir
codesandgetthemevaluated.Italsoexplainshowtheproblemsetterscanaddproblems
andmanagecontests.Ithighlightstheplagiarismdetectionandstaticcodeanalysisdoneby
thejudge.Onlinejudgeisapartoftheprojectwhichalsocontainsthecommentclassifier.
3.1.1.3Definitions,Acronyms,andAbbreviations
1. Plagiarism:Actofcopyingsomeoneelse'sworkillegitimately
2. Program:Asetofcomputerinstructionscapableofperformingsomefunction
3. SCA:StaticCodeAnalysis
4. MOSS:MeasureofSoftwareSimilarity
3.1.1.4Overview
Thenextsection,GeneralDescription,ofthisdocumentgivesanoverviewofthe
functionalityoftheapplication.Itdescribestheinformalrequirementsandisusedto
establishacontextforthetechnicalrequirementsspecificationinthenextsection.
Thethirdsection,SpecificRequirements,ofthisdocumentiswrittenprimarilyforthe
developersanddescribesintechnicaltermsthedetailsofthefunctionalityoftheapplication.
Bothsectionsofthedocumentdescribethesamesoftwareproductinitsentirety,
butareintendedfordifferentaudiencesandthususedifferentlanguage.

34

TheSRSisinIEEE830format.

3.1.2GeneralDescription

3.1.2.1ProductPerspective
Themainpurposeofourprojectistocreateauserfriendlywebapplicationthatcan
enhanceusersproblemsolvingandprogrammingskills.Theapplicationwasdeveloped
keepinginmindtheincreasinguseofprogrammedallaround.
3.1.2.2ProductFunctions
1.Usermanagement:Theplatformsupportstwouserrole:User,theoneswhocan
participateinthecontestsandAdmin,theoneswhocanmanagethecontests.
2.CodeEvaluation:Aplatformwheretheusercantesthiscodeforthecorrectnessand
adequatealgorithmiccomplexity.
3.ContestManagement:Theadminscancreate,editanddeletecontestsandproblemsas
andwhenrequired.
4.PlagiarismDetection:Alltheacceptedcodesarecheckedforplagiarism.
5.StaticCodeAnalysis:Usercanperformthestaticcodequalitycheckonhis/hercode.
3.1.2.3UserCharacteristics

1.Theusershouldbeawareofinterfaces.
2.Theusermusthaveavalidemailidandpasswordinordertologintothesystem.
3.Theusermusthaveaninternetconnectioninordertousethewebapplication.
4.Onlyauthorizedusersshouldbegiventheadminrights.
3.1.2.4GeneralConstraintsandFeatures
1.AnyOperatingSystem
2.Theemailidusedforloginshouldbevalid.
3.Evaluation,SCAandplagiarismtestingaresupported.

35

3.1.2.5AssumptionsandDependencies
1.Theuserhasawebbrowseronthesystemandaninternetconnection.
2.Theuserisawareaboutcompetitiveprogramming.
3.ThejudgeisonlyavailableforC++.

3.1.3SpecificRequirements

3.1.3.1ExternalInterfaceRequirements
3.1.3.1.1UserInterfaces
InterfacesshouldallowaUserto:
a. Loginorsignupintothesystem
b. Viewupcomingandongoingcontests
c. Viewproblemsofanongoingcontest
d. Submitsolution
e. ViewSubmissions
f.

Viewprofile

g. Editprofile
h. Requestadminrights
i.

Checkstaticcodequalityofacode

InterfacesshouldallowanAdminto:
a. Loginorsignupintothesystem
b. Viewupcomingandongoingcontests
c. Add,editanddeletecontests
d. Add,editanddeleteproblemsandtestdata
e. ViewSubmissions
f.

Viewprofile

g. Editprofile
h. Grantadminrights

36

3.1.3.1.2HardwareInterfaces
Apartfromtherecommendedconfigurationnootherspecifichardwareisrequiredtorunthe
software.
3.1.3.1.3SoftwareInterfaces
Operatingsystem:Anyoperatingsystem
Tools:Webbrowser
3.1.3.2FunctionalRequirements
3.1.3.2.1Usermanagement
1.Introduction:Thisexplainsthelogin/signupfunctionality.
2.Inputs:Forsignup,usergiveshispersonaldetailsandforlogin,userjustgivesthe
emailidandpassword.
3.Processing:Thesystemaddstheuserincaseofsignupandincaseofloginitvalidates
theuser.
4.Outputs:Theusercanviewthesite.
5.ErrorHandling:Incaseofinvaliddetails,appropriatemessageisdisplayed.
3.1.3.2.2CodeEvaluation
1.Introduction:Here,thesubmittedcodeischeckedforalgorithmiccorrectnessandrequired
algorithmiccomplexity.
2.Inputs:UsersubmitsthecodeinCorC++.
3.Processing:Thesystemchecksthegeneratedoutputagainsttheexpectedoutputfor
predefinedtestcasesandchecksifthetimeandmemoryusediswithinpredefinedlimitsor
not.
4.Outputs:Thesystemdisplaysthestatus:Compilationerror,Wrong,Accepted,Timelimit
exceededorRuntimeerror.
5.ErrorHandling:Incaseuserdoesntselecttheproblemproperlyorthereissomeother
error,propermessageisdisplayed.

37

3.1.3.2.3ContestManagement
1.Introduction:Here,theadmincanview,edit,addordeletecontests/problems
2.Inputs:Incaseofneworedit,theadminentersthedetailsofthecontest.Incaseof
delete,theadminselectsthedeleteoption.
3.Processing:Actionistakenasperusersinput.
4.Outputs:Databaseisupdatedandcorrespondingupdatesbecomevisible.
5.ErrorHandling:Appropriatemessageisshownincaseofvalidinputs.
3.1.3.2.4PlagiarismAnalysis
1.Introduction:Alltheacceptedcodesarecheckedforplagiarism.
2.Inputs:Acceptedcodesaretheinput.
3.Processing:ThecodesareanalysedusingMeasureofSoftwareSimilarity.
4.Outputs:Plagiarisedfilesareshown.
5.ErrorHandling:Errorhandlingisdoneincaseofanyfault.
3.1.3.2.5StaticCodeAnalysis
1.Introduction:Usercancheckthestaticcodequalityofthecode.
2.Inputs:UsersubmitsthecodeinC++.
3.Processing:cppcheckchecksthecode.
4.Outputs:Staticcodeerrorsaredisplayed.
5.ErrorHandling:Errorhandlingisdoneincaseofanyfault.
3.1.3.3NonFunctionalRequirements
1)Performance
Correctresultshouldbedisplayedtotheuserasfastaspossibleontotheterminal.
2)Reliability
Theapplicationshouldnotcrashunderanycircumstancesuchasausershowingentering
someinvaliddetailsduringsignup.
3)Portability

38

Theapplicationisportableandrunsonanymachinewithgoodinternetconnection.
4)Interoperability
Itcanworkonanywebbrowser,onanyoftheoperatingsystems.

3.1.4OtherRequirements

3.1.4.1Database
Wewillusesqlite3asthedatabaseinthedevelopmentphase.

3.2COMMENTCLASSIFICATION

3.2.1COMMENTCATEGORIES
Afteranalysingandreferring[25],wehaveclassifiedthecommentsintothefollowing
categories:
Header:Itcanincludetheinformationaboutthecopyrightorthelicenseofthesource
codefileoritcangiveanoverviewaboutthefunctionalityoftheclass.Inaddition,itcan
provideinformationabouttheauthoroftheclass,therevisionnumber,thepeerreviewstatus
etc.Headercommentsaremostlyfoundinthebeginningofthecode.

Interface:Aninterfacecommentdescribesthefunctionalityofamethodorafield.Interface
commentsarethereforelocatedeitherbeforeamethod/fielddefinitionorinthesamelineas
afielddefinition.TheycanprovideinformationforthedeveloperandbeusedforanAPIof
theproject.Itdescribestheparametersrequiredforinterfacing.
Inline:Developersuseinlinecommentstocommentoncodewithina
method/structure/classdefinition.Inlinecommentsdescribeimplementationdecisionsor
otherdetails.
Section:Asectioncommentsummarizesalargerpartofaclassthatcoversonefunctional
aspect.Itusuallyaddressesseveralmethods(orfields)togetherwhichbelongtothesame
functionalaspect.

39

Code:Commentedoutcodeissourcecodethatdeveloperswanttobeignoredbythe
compiler.Oftencodeistemporarilycommentedoutfordebuggingpurposesorforpotential
laterreuse.
Task:Ataskcommentisanoteforthedeveloperaboutanunfulfilledtask.Iteither
containsaremainingtodo,anoteaaboutabugthatneedstobefixed,oraremarkaboutan
implementationhack.

3.2.2EXTRACTINGFEATURESFORCLASSIFICATION
WeareusingtheWekatoolforclassification.Weuse10featuresformachinelearningas
indicatedinTable3.1.

Table3.1:MACHINELEARNINGFEATURESFORCOMMENTS
FEATURE

TYPE

CRITERIA

copyright

boolean

trueifitcontainsthewordlicenseorcopyright

header

boolean

trueifitcontainsthewordauthor

section

boolean

trueifitcontainsseparatorstrings(,/////,*****)multiple
times

length

real

lengthofcomment(inwords)

task

boolean

trueifitcontainstodo,hackorfixme

specialchars

real

percentageofspecialcharactersinthecode
40

code

boolean

trueifitcontainscode

insidemethod

boolean

trueifitinsideamethod/class/struct

parenthesis

real

numberofoutstandingopenparenthesis

first

boolean

trueifitisthefirstcommentofthefile

WeclassifycommentsinWekausingmultiplemachinelearningalgorithmsandanalysethe
results.

4.Implementation
Thischapterdealswiththeimplementationdetailsoftheproject.

4.1PROGRAMMINGLANGUAGESANDTOOLS
Programminglanguagesused:
a. Frontend:Ruby,HTML,CSS,Javascript
b. Backend:C++,Python,Ruby
41

Toolsused:
a. cppcheck:Forstaticcodeanalysis
b. Weka:Formachinelearning
c. Codemirror:Anopensourceeditor
Frameworkused:
FrontendusesRubyonRailswebapplicationframework.

4.2COMPONENTS
4.2.1IMPLEMENTATIONOFINTERFACEANDCODESHELL
ThefollowinginterfaceswereimplementedusingRubyonRails:
4.2.1.1INTERFACESFORTHEUSER
a. TheDeviserubygemwasusedforimplementinglogin/signup.Interfaceofthelogin
pageisasshowninFigure4.1.

Figure4.1:SCREENSHOTOFLOGINPAGE
b. InterfaceofthesignuppageisasshowninFigure4.2.

42


Figure4.2:SCREENSHOTOFSIGNUPPAGE
c. Viewupcomingandongoingcontests
InterfaceofthehomepageisasshowninFigure4.3.

Figure4.3:SCREENSHOTOFHOMEPAGE
d. Viewproblemsofanongoingcontest
InterfaceofthecontestpageisasshowninFigure4.4.

43


Figure4.4:SCREENSHOTOFCONTESTPAGE
e. Submitsolution
TheeditorportionoftheinterfaceusesCodemirror[32],aversatiletexteditor
implementedinJavascriptforthebrowser.Interfaceofthesubmitpageisasshownin
Figure4.5.

Figure4.5:SCREENSHOTOFSUBMITPAGE
f.

ViewSubmissions
44

InterfaceofthesubmissionspageisasshowninFigure4.6.

Figure4.6:SCREENSHOTOFSUBMISSIONSPAGE
g. Viewprofile
InterfaceoftheprofilepageisasshowninFigure4.7.

Figure4.7:SCREENSHOTOFVIEWPROFILE
h. Editprofile
InterfaceoftheeditprofilepageisasshowninFigure4.8.

45


Figure4.8:SCREENSHOTOFEDITPROFILE
i.

Requestadminrights
InterfaceoftheadminrequestpageisasshowninFigure4.9.

Figure4.9:SCREENSHOTOFREQUESTADMINRIGHTSPAGE
j.

Checkstaticcodequalityofacode
InterfaceforcheckingcodequalityisasshowninFigure4.10.

46


Figure4.10:SCREENSHOTOFSTATICCODEQUALITYCHECKOPTION
4.2.1.2INTERFACESFORTHEADMIN
a. Loginpage
Sameasforuser.
b. Signuppage
Sameasforuser.
c. Addnewcontest
InterfaceofthenewcontestpageisasshowninFigure4.11.

Figure4.11:SCREENSHOTOFCREATECONTESTPAGE
d. Editcontestdetails
InterfaceoftheeditcontestpageisasshowninFigure4.12.

47


Figure4.12:SCREENSHOTOFEDITCONTESTPAGE
e. Viewupcomingandongoingcontests
Sameasforuserbutalsohasadeleteoption.
f.

Viewproblemsofongoingcontest
Sameasforuserbutalsohasadeleteoption.

g. Add/Editproblem
InterfacesforaddingissameasforeditingproblemsasshowninFigure4.13.

Figure4.13:SCREENSHOTOFEDITPROBLEMPAGE
h. Viewprofile
Sameasforuser.

48

i.

Editprofile
Sameasforuser.

j.

Grantadminrights
InterfaceofthecreatingadminpageisasshowninFigure4.14.

Figure4.14:SCREENSHOTOFADDINGADMIN

4.2.2INTEGRATIONOFCLIENTANDSERVER
4.2.2.1USINGAWSS3BUCKETS
AmazonSimpleStorageService(S3)helpsinintegrationbetweentheclientandtheserver.

Figure4.15:SCREENSHOTOFLISTOFS3BUCKETSUSED
TheS3bucketscreated(asshowninFigure4.15)andtheirpurposesareasfollows:
a. coderunneraccepted:Thecodeswhichgetacceptedafterevaluationareuploaded
tothisbucketforPlagiarismDetection.

49

b. coderunnersubmissions:Assoonasausersubmitsasolution,itisuploadedtothe
bucketinwiththefollowingfilenameformat:
username#contestcode#problemcode#timestamp#time_limit#memory_limit#.cpp
c. coderunnerusers:Thisbucketcontainstheresponsefileforeachuserwiththe
nameinthefollowingformat:
username_response.html
d. coderunnerdetails:Itcontainsthecontestandproblemdatafilesinthefollowing
format:
TESTDATA:problemcode#contestcode#data#.zipwhichcontainanInputandOutput
folder.
CONTESTDETAILS:contestcode#metadata#.txt
e. cppchecksubmissions:Thisbucketcontainsthecodessubmittedforstaticcode
analysisinthefollowingformat:
username#timestamp#.cpp
f.

cppcheckresponses:Thisbucketcontainstheresponsesgeneratedafterstatic
codeanalysisinthefollowingformat:
username_response.html

4.2.2.2USINGPYTHONSCRIPTSATTHEBACKEND

50


Figure4.16:INTERACTIONOFPYTHONSCRIPTSINBACKEND
Thepythonscriptsusedinbackendhavebeendescribedbelow.Theinteractionsbetween
thescriptshasbeenshowninFigure4.16.
InitiateJudge.py
Startsotherauxiliaryscriptswhichincludeextractor.py,download.pyandmonitor.py
extractor.py
Initialisesthecontest.Downloadsthetestdataofproblemsfromcoderunnerdetailsbucket
andcreatesthefoldersneededforthecodecheckertofunctionproperly.Italsoresetsthe
responsepagesofusersincoderunnerusersbucket.
download.py
Itdownloadstheusersubmissionsfromcoderunnersubmissionsbucketandsavesittoa
localsubmissionsfolderfromwheremonitor.pycanaccessit.
monitor.py

51

Thisscriptistheheartofcodechecker.Itfetchessubmissionsfromlocalsubmissionfolder
andsendsittocodecheckerwhereitgetsevaluated.Itreceivesaverdictfromcodechecker
whichitconvertstohtmlformatandappendstouserresponsefileanduploadsitto
coderunnerresponses.Itdoestheprocessingofsubmissionsparallelyandusesmutexto
lockcodecheckerwhichisthecommonresourceforallthethreads.

4.2.3IMPLEMENTATIONOFCODECHECKER
Basically,thephasesinvolvedinfunctioningofa
codechecker(asshowninFigure4.17)are:
Phase1:CheckingforCompilationErrorsduringCompilation
Phase2:Runningthesourcecodewithhiddentestcases
Phase3:Ensuringtheexecutioncompleteswithintimeand
memorylimits
Phase4:CheckingforRuntimeErrors
Phase5:ComparingGeneratedOutputwithActualOutput
Phase6:Deletingusergeneratedfiles
Figure4.17:ORDEROFSTEPSINCODECHECKER
PURPOSE

COMMAND

Tocompile

system(g++filename.cppofilename)

Toensure
limits

system(ulimitttime_limitmmemory_limitfile_to_be_executed)

Tocompare
files

system(diffgenerated_outputexpected_output)

Table4.1COMMANDSUSEDINCODECHECKER
PHASE1:CompilationofUserCodeandcheckingforCompilationError

52

Theprogramuploadedbytheuserisfirstcheckedforcompilererrors,thenforcorrectness
withthehelpofpredefinedtestcases.StandardGCCcompilerisusedinourapplication.If
thesourcecodeiscompiledsuccessfully,thenitisexecuted.Otherwisetheuserisprovided
informationaboutthecompilationerrorandtheexecutionisaborted.Aseparatecompilation
errorfileisgeneratedforeverysourcecode.
PHASE2:Runningsourcecodewithhiddentestcases
Aftersuccessfulcompilation,thesourcecodeisexecutedusingpredefinedtestcases.
Thesetestcasesarehiddenfromtheuser.Theyareusedtotestthecorrectnessofthe
code.Designingtestcasesisatedioustask.Itisimpracticaltomanuallydesignthem.Asa
consequence,testscriptsarewrittentoautomatethisprocess.
PHASE3:EnsuringTimeandMemorylimits
TimeandMemorylimitsarespecifictoeveryproblemandtheusercodemustfinish
executionwithinthoselimitsforsuccessfultermination.Theselimitsareensuredbyusing
ulimitcommandwith't'and'm'flagsfortimeandmemorylimitsrespectively.
Phase4:CheckingforRuntimeErrors
Runtimeerrorscomeupwhileexecutionoftheprogram.Differentconditionscancause
runtimeerrors.Themostcommoncausesare:Accessingillegalmemory,Illegalinstruction,
Filesizelimitexceededetc.Whenaruntimeerrorcomesup,theprogramraisesasignal.
Thissignalishandledbyasignalhandlerwhichterminatestheexecutionanddisplaysthe
Runtimeerrortotheuser.
PHASE5:Comparisonofoutput
Aftersuccessfulexecutionoftheusercode,thegeneratedoutputfordifferenttestcasesare
comparedtotheexpectedoutputsfortherespectivetestcases.Ifthegeneratedoutputs

53

exactlymatchtheexpectedoutputs,theusercodeisconsideredtobecorrectotherwiseitis
consideredwrong.
PHASE6:Deletingusergeneratedfiles
WhileCompilationandExecution,multipleuserspecificfilesaregenerated.Theseinclude
compilation_error_files,generated_output,log_filesetc.Thesefilesneedtobedeletedafter
completionofexecution.
ThesystemcallinvokesfunctioncorrespondingtothephaseaslistedinTable4.1.
WorkingofCodechecker:
TheCodecheckerhasbeendividedintomultiplefiles.Thisisdoneinordertomakethe
applicationscalable.CurrentlythereissupportforC++.Butthecodecheckercaneasilybe
scaledtoincludesupportforotherlanguageslikepythonandJava.
Invoker.cpp:
Thisfileisinvokedbytheserverendtostartcheckingthecode.Thisfilethenexecuted
codechecker.cpp.
Codechecker.cpp:
Thisfileisgeneric.Thiscontainsthecodetoidentifytheextensionoftheusercodefile.It
alsocontainsthecodewhichiscommontoallthelanguages.Theoperationsprovidedby
thisfileincludecomparisonofoutput,deletingusergeneratedfilesanddisplayingtheresult.
ThisfilethenexecutesthelanguagespecificfileCplusplus.cpp
Cplusplus.cpp:
Thisislanguagespecificfile.Toaddsupportforotherlanguages,filesspecifictothose
languages(likecplusplus.cpp)havetobeadded.Thisexecutestheusercodewiththe
predefinedtestcasesandensuresthetimeandmemorylimits.

54


Figure4.18:ORDEROFEXECUTIONINCODECHECKER
TheorderofexecutionofthesefileshasbeenshowninFigure4.18.
ExecutionofUserCode
Theusercodeisexecutedwithvarioustestcases.Eachtestcaseishandledbyaseparate
process.
CodeSnippet:

Table4.2CODESNIPPETUSEDFOREXECUTINGCODE

pid_tpids[MAX]
intstatus
for(inti=0i<files_counti++)
{
pids[i]=fork()

if(pids[i]<0)
{
perror("Processcouldnotbecreated")
abort()
55

}
elseif(pids[i]==0)
{

if(execute_file(i,problem_name,argv[4],executable_name,
contest_name,executable_name)==1)
abort()
exit(20)
}

else
{
waitpid(1,&status,0)
}
}

4.2.4IMPLEMENTATIONOFPLAGIARISMDETECTION

Figure4.19:SOURCECODEFILESUSEDFORPLAGIARISMDETECTION
Wemodifiedtheopensourceproject[33]forimplementingPlagiarismDetection.Following
arethesourcecodefilesused(alsodepictedinFigure4.19):
filter_confset.rb
Configuresettousefilter.Aconfiguresetisakeywordsset+commenttypespecifictothe
language.Itusescpp.confwhichcontainskeywordsspecifictoc++andhowcommentsare
specifiedinc++.

56

noise_filter.rb
Thisfilefiltersthesourcefile.Itremoveswhitespace,capitalization,punctuationsand
identifiers.Keywordsarealwaysthesame,soneedtobefilteredout.
rollhasher.rb
ArollinghashfunctionsimilartoRabinKarpAlgorithm
winnower_confset.rb
ReadtheconffiletogetparametersforWinnower.Ituseswinnower.confwhichcontainsthe
valuesofk,tandq.kisthesizeoffingerprint,tisthewindowsizefromwhichatleastone
fingerprintischosenandqistheprimenumberusedforhashing.
robust_winnower.rb
ImplementationofRobustWinnowing.Generatesfingerprintsfromfilteredtext.
detect_plagiarism.rb
Filtertextfromthetwofilesandgeneratetheirfingerprints.Findcommonfingerprintsand
printpercentagesimilarityonthebasisofcommonfingerprints.
plagiarism_detector.py
Itrunsdetect_plagiarism.rbfileonallpairsofacceptedsolutionsandprintsthosewhohave
apercentagesimilarityoverathresholdvalue(currentlysetto50%).

4.2.5INTEGRATIONWITHCPPCHECK

57


Figure4.20:SOURCECODEFILESUSEDFORSTATICCODEANALYSISUSING
CPPCHECK
Followingsourcecodefiles(showninFIgure4.20)wereimplementedandusedforstatic
codeanalysis:
InitiateCppcheck.py
Startsotherauxiliaryscriptswhichincludecppdownload.pyandcppmonitor.py
cppdownload.py
Itdownloadstheusersubmissionsfromcppchecksubmissionsbucketandsavesittoalocal
cppsubmissionsfolderfromwherecppmonitor.pycanaccessit.
monitor.py
Thisscriptfetchessubmissionsfromlocalcppsubmissionfolderandsendsittocppcheck
toolwhereitsstaticanalysisisdone.Itreceivesaverdictfromcppcheckwhichitconvertsto
htmlformatandaddstouserresponsefileanduploadsittocppcheckresponses.Itdoesthe
processingofsubmissionsparallelyandusesmutextolockcppcheckwhichisthecommon
resourceforallthethreads.

58

4.2.6COMMENTCLASSIFIER
4.2.6.1FINDINGDATASET
WepickeduptheC++sourcecodedatasetsfromtheLLVM[30]andBOOST[31]libraries.
4.2.6.2CREATING.ARFF
Wepickedupthecommentsfromtheabovedatasets,extractedfeaturesfromthose
commentsandcreatedourowntrainingdata(.arfffile)tobeusedbyWeka.Aportionofthe
datasethasbeenshowninTable4.2.
Table4.2PORTIONOFTHETRAININGDATASET
14,'exceptiondemo_370',false,false,false,34,false,13,false,false,false,1,method
15,'exceptiondemo_478',false,false,false,5,false,7,false,true,false,2,inline
16,'exceptiondemo_496',false,false,false,4,false,8,false,true,false,2,inline

4.2.6.3GOODVSBADCOMMENTING
Afterclassifying,wecantellaboutthequalityofcommentingfromtheresultsofthe
classification.Acodeissaidtohavegoodqualitysourcecodecommentsif:
1. Headercommentmustbepresent.
2. Thereshouldbeasmanymethodnamesasmanymethodsatleast.
3. Thereshouldbeatleast210sectioncommentstoshowmodularityinthecode.
4. Thereshouldbeleastnumberoftaskandcodecomments.
5. 0.5*Methodcomments<Numberofinlinecomments<4*methodcomments.
Largenumberofinlinecommentswouldindicatemoreredundancyandlessnumber
ofinlinecommentswouldindicatepoordocumentation.

4.2.7USINGGIT
WemaintainedaGITRepositoryforworkingcollaboratively:

59


Figure4.21:GITHUBREPOSITORYUSEDFORTHEPROJECT
Thecodebaseisavailableath
ttps://github.com/tussharsingh13/coderunner( shownin
Figure4.21).

5.ObservationsandResults

60

5.1RESULTSOFCODECHECKER
Followingkindsofresponsesaregivenbythecodechecker:
1. Whenthecodepassesalltestcases,itgivestheaccepted(ACC)statusas
showninFigure5.1:

Figure5.1:WHENCODEGETSACCEPTED
2. Whenthecodefailssometestcases,itgivesthewronganswer(WA)status
asshowninFigure5.2:

61

Figure5.2:WHENCODEGIVESWRONGANSWER
3. Whenthecodefailsduringcompilation,itgivesthecompilationerror(CE)
statusasshowninFigure5.3:

Figure5.3:WHENCODEGIVESCOMPILATIONERROR

62

4. Whenthecodeexceedstimelimitonsomecase,itgivesthetimelimit
exceeded(TLE)statusasshowninFigure5.4:

Figure5.4:WHENCODEEXCEEDSTIMELIMIT
5. Whenthecodegivesruntimeerror,itgivestheruntimeerror(RE)statusas
showninFigure5.5:

63


Figure5.5:WHENCODEGIVESRUNTIMEERROR

5.2PLAGIARISMDETECTIONRESULTS
TheMOSSbasedPlagiarismdetectorgavetheresultsasshowninFigure5.6:

64


Figure5.6:RESULTSAFTERPLAGIARISMCHECK

5.3CPPCHECKRESULTS
TheonlinejudgegivesanoutputasshowninFigure5.7aftercppcheckhas
analysedthecode.

Figure5.7:RESULTSAFTERCPPCHECKANALYSESTHECODE

65

5.4COMPARISONWITHDIFFERENTALGORITHMS
Table5.1COMPARISONOFDIFFERENTMACHINELEARNINGALGORITHMS

J48(Trees) NaiveBayes
(Bayes)

Logistic
(Functions)

ZeroR
(Rules)

IBk
(Lazy)

Percentage
Split

93.94

84.85

87.88

27.27

90.91

Crossvalidation 88.66

85.57

84.54

24.74

91.75

Trainingset

90.72

100

24.74

100

96.91

J48givesthebestresultsassummarisedinTable5.1.Otherseither
overfitoverthedatasetorhavelessaccuracy.

66

6.ConclusionandFutureWork
Thischapterdiscussesthelimitationsandfuturescopeoftheprojectandfinally
summarisestheconclusion.

6.1CONCLUSION
Thus,wewereabletoimplementanonlinejudgewhichallowedcontest
management,codeevaluation,staticcodequalityanalysisandplagiarismdetection.
Wewerealsoabletoclassifycommentswithadecentaccuracy.

6.2LIMITATIONS
Followingarethelimitationsoftheproject:
Falsepositivesforsectioncommentsincrease,iftrainingdatasetincludes
commentspresentinsidestruct/method/class.
Whenthenumberofdownload/uploadrequestsarehighi.earound1000per
second,AWSS3bucketsareunabletohandletheload.
Theonlinejudgelackssecurity.
ForalgorithmshavingconventionalimplementationlikeFloydWarshall,the
numberoffalsepositivesduringplagiarismdetectionincrease.
TheonlinejudgeiscurrentlyimplementedforC++only.
MOSShasalimitationthatitcanbebrokenwithchangeinthestructureif
thereareseveralwaysofwritingthesamelogic.
J48isnotanonlinealgorithm,itneedstrainingeverytimethedatais
changed.

67

6.3FUTURESCOPE
Anothercategoryofcomments,referencecommentscanbeaddedwhichare
usedtorefertotheexistingcodesandlibrariesused.
AutomatictestcasegenerationcanbeaddedusinglibrarieslikeNetworkxfor
graphbasedproblems.
ThesupportforotherlanguageslikeC,Python,Javaetc.canbeprovided.
Theinterfacecanbemademoreuserfriendlyandadvanced.
Thejudgecurrentlyprovidesaplatformforcontestsonly.Itcanactasa
platformforproblemsaswelli.eitcanmaintainasetofproblemsforpractice
aswell.
ElasticCloudComputing(EC2)servicecanbeusedinplaceofS3sothat
computationshappendirectlyonthecloudandthetimewastedin
download/uploadontheserverendissaved.Thiswillmakethejudge
efficient.InEC2,thenumberofcomputinginstancesscaleupaspertheload,
thusmaximizingefficiency.
Securityfeatureslikechroot,IPTablescanbeaddedtotheonlinejudgeto
makeitsecure.
AdvancedsandboxingtechniqueslikeSECCOMPcanbeusedtoprovide
highlevelsecurity.
Usefulnessandcoherenceofthecommentscanbeanalysedafter
classificationbyusingvarioustechniquesliketheparsingthefunctionname
andthecomment,andthenevaluatingthepercentageofcommonwords
betweenthetwotoassesstheusefulnessofthecommentanditscoherence
withthecode.
68

REFERENCES

[1]ShaunBebbington(2014)."Whatiscoding".Retrieved20140303.
[2]ShaunBebbington(2014)."Whatisprogramming".Retrieved20140303.
[3]LaurenOrsini(2013).WhyProgrammingIsTheCoreSkillOfThe21stCentury,
Availableat:http://readwrite.com/2013/05/31/programmingcoreskill21stcentury
[4]PennyGrubbandArmstrongTakang.SoftwareMaintenance:Conceptsand
Practice,2003,pg7,120121.
[5]Nations,Daniel."WebApplications".About.com.Retrieved20January2014.
[6]KaushikMV(2013).LearntocodebyCompetitiveProgramming,Availableat:
http://blog.hackerearth.com/2013/09/competitiveprogramminggettingstarted_11.ht
ml
[7]Doesthiscompetitiveprogrammingreallyhelpsinindustry,Availableat:
http://discuss.codechef.com/questions/46837/doesthiscompetitiveprogrammingre
allyhelpsinindustry
[8]AboutRuby,Availableat:https://www.rubylang.org/en/about/
[9]WhatisRails,Availableat:https://www.rubylang.org/en/about/
[10]RubyonRails:WhatItisandWhyWeuseitforWebApplications,Available
at:
http://www.bitzesty.com/blog/2014/7/rubyonrailswhatitisandwhyweuseitforw
ebapplications
[11]RichardDelaney(2013),PythonScriptsasaReplacementforbashscripts,
Availableat:
http://www.linuxjournal.com/content/pythonscriptsreplacementbashutilityscripts
[12]AmazonEC2,Availableat:AmazonElasticComputeCloud(AmazonEC2),
CloudComputingServers.Aws.amazon.com(20140701).Retrievedon
20140701.
[13]Amazons3,Availableat:http://aws.amazon.com/s3/
[14]AmazonEC2,Availableat:http://aws.amazon.com/ec2/
[15]JPaulGibson,SoftwareReuseandPlagiarism:ACodeofPractice,in
ITiCSE09,July69,2009,Paris,France.
[16]SaulSchleimer,et.al,Winnowing:LocalAlgorithmsforDocument
Fingerprinting,SIGMOD2003,June912,2003,SanDiego,CA.
[17]OverviewofPlagiarismDetectionSoftware,Availableat:
http://www.cshe.unimelb.edu.au/assessinglearning/03/plagsoftsumm1.html
[18]P.Emanuelsson,U.Nilsson,AComparativeStudyofIndustrialStaticAnalysis
Tools,ElectronicNotesinTheoreticalComputerScience,Vol.217,pp.521,2008.
69

[19]H.R.Nielson,F.Nielson,andC.Hankin,PrinciplesofProgramAnalysis,
SpringerVerlagNewYork,1999.
[20]S.Muchnick,Advancedcompilerdesignandimplementation,MorganKaufmann
Publishers,1997.
[21]M.Fowler.Refactoring.ImprovingtheDesignofExistingCode.AddisonWesley,
1999.
[22]B.Blanchet,P.Cousot,R.Cousot,J.Feret,L.Mauborgne,A.Min,D.
Monniaux,X.Rival.AStaticAnalyzerforLargeSafetyCriticalSoftware,Proc.ACM
Conf.onProgrammingLanguageDesignandImplementation(PLDI2003),San
Diego,California,June714,pp.196207,2003
[23]EliasPentilla,ImproveC++SoftwareQualitywithStaticCodeAnalysis,
MastersThesis,DegreeProgrammeinComputerScienceandEngineering,School
ofScience,AaltoUniversity,2014
[24]Weka3:DataMiningSoftwareinJava,Availableat:
http://www.cs.waikato.ac.nz/ml/weka/
[25]DanielaSteidl,et.al,QualityAnalysisofSourceCodeComments,Program
Comprehension(ICPC),2013IEEE21stInternationalConferenceon,San
Francisco,CA,2021May,pp.8392,2013
[26]ZhenMingJiangandAhmedE.Hassan.ExaminingtheEvolutionofCode
CommentsinPostgreSQL.InProceedingsofthe2006InternationalWorkshopon
MiningSoftwareRepositories,MSR06,pages179180.ACM,2006.
[27]CarlS.HartzmanandCharlesF.Austin.Maintenanceproductivity:Observations
basedonanexperienceinalargesystemenvironment.InProceedingsofthe1993
ConferenceoftheCentreforAdvancedStudiesonCollaborativeResearch:Software
EngineeringVolume1,CASCON93,pages138170,1993.
[28]ManuelJ.BarrancoGarc
aandJuanCarlosGranjaAlvarez.Maintainabilityas
aKeyFactorinMaintenanceProductivity:ACaseStudy.InProceedingsofthe1996
InternationalConferenceonSoftwareMaintenance,ICSM96,pages8793,1996.
[29]PaulOmanandJackHagemeister.MetricsforAssessingaSoftwareSystems
Maintainability.InProceedingsofthe1992ConferenceonSoftwareMaintenance,
pages337344.IEEE,1992.
[30]Boostlibrarydataset,Availableat:
http://softlayersng.dl.sourceforge.net/project/boost/boost/1.49.0/boost_1_49_0.tar.g
zboost
[31]LLVMlibrarydataset,Availableat:
http://llvm.org/releases/3.6.0/libcxx3.6.0.src.tar.xz
[32]Codemirror,Availableat:http://codemirror.net/
[33]timfel/moss.rb,Availableat:https://github.com/timfel/moss.rb

70

Potrebbero piacerti anche