Sei sulla pagina 1di 5

R Reference Card for Data Mining Performance Evaluation kmeansCBI() interface function for kmeans (fpc)

Yanchang Zhao, RDataMining.com, May 20, 2016 performance() provide various measures for evaluating performance of pre- cluster.optimal() search for the optimal k-clustering of the dataset
yanchang@rdatamining.com diction and classification models (ROCR) (bayesclust)
PRcurve() precision-recall curves (DMwR) clara() Clustering Large Applications (cluster)
- See the latest version at http://www.RDataMining.com CRchart() cumulative recall charts (DMwR) fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (cluster)
- The package names are in parentheses. roc() build a ROC curve (pROC) kcca() k-centroids clustering (flexclust)
- Recommended packages and functions are shown in bold. auc() compute the area under the ROC curve (pROC) ccfkms() clustering with Conjugate Convex Functions (cba)
- Click a package in this PDF file to find it on CRAN. ROC() draw a ROC curve (DiagnosisMed) apcluster() affinity propagation clustering for a given similarity matrix (ap-
Packages cluster)
Association Rules and Sequential Patterns apclusterK() affinity propagation clustering to get K clusters (apcluster)
party recursive partitioning
cclust() Convex Clustering, incl. k-means and two other clustering algorithms
rpart recursive partitioning and regression trees
Functions (cclust)
randomForest classification and regression based on a forest of trees using ran-
apriori() mine associations with APRIORI algorithm a level-wise, KMeansSparseCluster() sparse k-means clustering (sparcl)
dom inputs
breadth-first algorithm which counts transactions to find frequent item- tclust(x,k,alpha,...) trimmed k-means with which a proportion alpha of
ROCR visualize the performance of scoring classifiers
sets (arules) observations may be trimmed (tclust)
caret classification and regression models
eclat() mine frequent itemsets with the Eclat algorithm, which employs r1071 functions for latent class analysis, short time Fourier transform, fuzzy
equivalence classes, depth-first search and set intersection instead of clustering, support vector machines, shortest path computation, bagged cluster- Hierarchical Clustering
counting (arules) ing, naive Bayes classifier, . . . a hierarchical decomposition of data in either bottom-up (agglomerative) or top-
cspade() mine frequent sequential patterns with the cSPADE algorithm (aru- rpartOrdinal ordinal classification trees, deriving a classification tree when the down (divisive) way
lesSequences) response to be predicted is ordinal hclust() hierarchical cluster analysis on a set of dissimilarities
seqefsub() search for frequent subsequences (TraMineR) rpart.plot plots rpart models birch() the BIRCH algorithm that clusters very large data with a CF-tree (birch)
Packages pROC display and analyze ROC curves pvclust() hierarchical clustering with p-values via multi-scale bootstrap re-
arules mine frequent itemsets, maximal frequent itemsets, closed frequent item- nnet feed-forward neural networks and multinomial log-linear models sampling (pvclust)
sets and association rules. It includes two algorithms, Apriori and Eclat. RSNNS neural networks in R using the Stuttgart Neural Network Simulator agnes() agglomerative hierarchical clustering (cluster)
arulesViz visualizing association rules (SNNS) diana() divisive hierarchical clustering (cluster)
arulesSequences add-on for arules to handle and mine frequent sequences neuralnet training of neural networks using backpropagation, resilient backprop- mona() divisive hierarchical clustering of a dataset with binary variables only
TraMineR mining, describing and visualizing sequences of states or events agation with or without weight backtracking (cluster)
rockCluster() cluster a data matrix using the Rock algorithm (cba)
Classification & Prediction Regression proximus() cluster the rows of a logical matrix using the Proximus algorithm
Decision Trees Functions (cba)
ctree() conditional inference trees, recursive partitioning for continuous, cen- isopam() Isopam clustering algorithm (isopam)
lm() linear regression
sored, ordered, nominal and multivariate response variables in a condi- flashClust() optimal hierarchical clustering (flashClust)
glm() generalized linear regression
tional inference framework (party) fastcluster() fast hierarchical clustering (fastcluster)
gbm() generalized boosted regression models (gbm)
rpart() recursive partitioning and regression trees (rpart) cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchical clus-
predict() predict with models
mob() model-based recursive partitioning, yielding a tree with fitted models as- tering dendrograms (dynamicTreeCut)
residuals() residuals, the difference between observed values and fitted val-
sociated with each terminal node (party) HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)
ues
Random Forest nls() non-linear regression
gls() fit a linear model using generalized least squares (nlme) Model based Clustering
cforest() random forest and bagging ensemble (party)
gnls() fit a nonlinear model using generalized least squares (nlme) Mclust() model-based clustering (mclust)
randomForest() random forest (randomForest)
Packages HDDC() a model-based method for high dimensional data clustering (HDclassif )
importance() variable importance (randomForest)
fixmahal() Mahalanobis Fixed Point Clustering (fpc)
varimp() variable importance (party) nlme linear and nonlinear mixed effects models
fixreg() Regression Fixed Point Clustering (fpc)
Neural Networks gbm generalized boosted regression models
mergenormals() clustering by merging Gaussian mixture components (fpc)
nnet() fit single-hidden-layer neural network (nnet) Clustering Density based Clustering
mlp(), dlvq(), rbf(), rbfDDA(), elman(), jordan(), som(),
Partitioning based Clustering generate clusters by connecting dense regions
art1(), art2(), artmap(), assoz()
partition the data into k groups first and then try to improve the quality of clus- dbscan(data,eps,MinPts,...) generate a density based clustering of
various types of neural networks (RSNNS)
tering by moving objects from one group to another arbitrary shapes, with neighborhood radius set as eps and density thresh-
neuralnet training of neural networks (neuralnet)
kmeans() perform k-means clustering on a data matrix old as MinPts (fpc)
Support Vector Machine (SVM) pdfCluster() clustering via kernel density estimation (pdfCluster)
kmeansruns() call kmeans for the k-means clustering method and includes
svm() train a support vector machine for regression, classification or density-
estimation of the number of clusters and finding an optimal solution from
estimation (e1071) Other Clustering Techniques
several starting points (fpc)
ksvm() support vector machines (kernlab)
pam() the Partitioning Around Medoids (PAM) clustering method (cluster) mixer() random graph clustering (mixer)
Bayes Classifiers pamk() the Partitioning Around Medoids (PAM) clustering method with esti- nncluster() fast clustering with restarted minimum spanning tree (nnclust)
naiveBayes() naive Bayes classifier (e1071) mation of number of clusters (fpc) orclus() ORCLUS subspace clustering (orclus)

1
Plotting Clustering Solutions Packages removeNumbers(), removePunctuation(), removeWords() re-
plotcluster() visualisation of a clustering or grouping in data (fpc) Rlof a parallel implementation of the LOF algorithm move numbers, punctuation marks, or a set of words from a text docu-
bannerplot() a horizontal barplot visualizing a hierarchical clustering (cluster) extremevalues detect extreme values in one-dimensional data ment (tm)
mvoutlier multivariate outlier detection based on robust methods removeSparseTerms() remove sparse terms from a term-document matrix (tm)
Cluster Validation
silhouette() compute or extract silhouette information (cluster)
outliers some tests commonly used for identifying outliers Frequent Terms and Association
cluster.stats() compute several cluster validity statistics from a clustering Time Series Analysis findAssocs() find associations in a term-document matrix (tm)
and a dissimilarity matrix (fpc) findFreqTerms() find frequent terms in a term-document matrix (tm)
clValid() calculate validation measures for a given set of clustering algorithms Construction & Plot termFreq() generate a term frequency vector from a text document (tm)
and number of clusters (clValid) ts() create time-series objects Topic Modelling
clustIndex() calculate the values of several clustering indexes, which can be plot.ts() plot time-series objects LDA() fit a LDA (latent Dirichlet allocation) model (topicmodels)
independently used to determine the number of clusters existing in a data smoothts() time series smoothing (ast ) CTM() fit a CTM (correlated topics model) model (topicmodels)
set (cclust) sfilter() remove seasonal fluctuation using moving average (ast ) terms() extract the most likely terms for each topic (topicmodels)
NbClust() provide 30 indices for cluster validation and determining the number Decomposition topics() extract the most likely topics for each document (topicmodels)
of clusters (NbClust) decomp() time series decomposition by square-root filter (timsac) Sentiment Analysis
Packages decompose() classical seasonal decomposition by moving averages polarity() polarity score (sentiment analysis) (qdap)
cluster cluster analysis stl() seasonal decomposition of time series by loess
Text Categorization
fpc various methods for clustering and cluster validation tsr() time series decomposition (ast)
textcat() n-gram based text categorization (textcat)
mclust model-based clustering and normal mixture modeling ardec() time series autoregressive decomposition (ArDec)
birch clustering very large datasets using the BIRCH algorithm Forecasting Text Visualizatoin
pvclust hierarchical clustering with p-values wordcloud() plot a word cloud (wordcloud)
arima() fit an ARIMA model to a univariate time series
apcluster Affinity Propagation Clustering comparison.cloud() plot a cloud comparing the frequencies of words
predict.Arima() forecast from models fitted by arima
cclust Convex Clustering methods, including k-means algorithm, On-line Up- across documents (wordcloud)
auto.arima() fit best ARIMA model to univariate time series (forecast)
date algorithm and Neural Gas algorithm and calculation of indexes for finding commonality.cloud() plot a cloud of words shared across documents
forecast.stl(), forecast.ets(), forecast.Arima()
the number of clusters in a data set (wordcloud)
forecast time series using stl, ets and arima models (forecast)
cba Clustering for Business Analytics, including clustering techniques such as Packages
Proximus and Rock Correlation and Covariance
tm a framework for text mining applications
bclust Bayesian clustering using spike-and-slab hierarchical model, suitable for acf() autocovariance or autocorrelation of a time series topicmodels fit topic models with LDA and CTM
clustering high-dimensional data ccf() cross-correlation or cross-covariance of two univariate series wordcloud various word clouds
biclust algorithms to find bi-clusters in two-dimensional data Packages lda fit topic models with LDA
clue cluster ensembles forecast displaying and analysing univariate time series forecasts wordnet an interface to the WordNet
clues clustering method based on local shrinking hts analysing and forecasting hierarchical and grouped time series RTextTools automatic text classification via supervised learning
clValid validation of clustering results TSclust time series clustering utilities qdap transcript analysis, text mining and natural language processing
clv cluster validation techniques, contains popular internal and external cluster dtw Dynamic Time Warping (DTW) sentiment140 sentiment text analysis using free sentiment140 service
validation methods for outputs produced by package cluster timsac time series analysis and control program tm.plugin.dc a plug-in for package tm to support distributed text mining
bayesclust tests/searches for significant clusters in genetic data ast time series analysis tm.plugin.mail a plug-in for package tm to handle mail
clustsig significant cluster analysis, tests to see which (if any) clusters are statis- ArDec time series autoregressive-based decomposition textir a suite of tools for inference about text documents and associated sentiment
tically different dse tools for multivariate, linear, time-invariant, time series models tau utilities for text analysis
clusterSim search for optimal clustering procedure for a data set textcat n-gram based text categorization
clusterGeneration random cluster generation Text Mining Rwordseg Chinese word segmentation using Ansj
gcExplorer graphical cluster explorer Importing Text
hybridHclust hybrid hierarchical clustering via mutual clusters Social Network Analysis and Graph Mining
readPDF() extract text and metadata from a PDF document (tm)
Modalclust hierarchical modal Clustering Functions
iCluster integrative clustering of multiple genomic data types Text Cleaning and Preparation
graph(), graph.edgelist(), graph.adjacency(),
EMCC evolutionary Monte Carlo (EMC) methods for clustering Corpus() build a corpus, which is a collection of text documents (tm) graph.incidence() create graph objects respectively from edges,
rEMM extensible Markov Model (EMM) for data stream clustering tm map() transform text documents, e.g., stemming, stopword removal (tm) an edge list, an adjacency matrix and an incidence matrix (igraph)
tm filter() filtering out documents (tm)
Outlier Detection TermDocumentMatrix(), DocumentTermMatrix() construct a
plot(), tkplot(), rglplot() static, interactive and 3D plotting of
graphs (igraph)
Functions term-document matrix or a document-term matrix (tm) gplot(), gplot3d() plot graphs (sna)
boxplot.stats()$out list data points lying beyond the extremes of the Dictionary() construct a dictionary from a character vector or a term- vcount(), ecount() number of vertices/edges (igraph)
whiskers document matrix (tm) V(), E() vertex/edge sequence of igraph (igraph)
lofactor() calculate local outlier factors using the LOF algorithm (DMwR stemDocument() stem words in a text document (tm) is.directed() whether the graph is directed (igraph)
or dprep) stemCompletion() complete stemmed words (tm) are.connected() check whether two nodes are connected (igraph)
lof() a parallel implementation of the LOF algorithm (Rlof ) SnowballStemmer() Snowball word stemmers (Snowball) degree(), betweenness(), closeness(), transitivity(), evcent()
stopwords(language) return stopwords in different languages (tm) various centrality measures (igraph, sna)

2
edge density() density of a graph (igraph) ColorMap() plot levels of a variable in a colour-coded map (RgoogleMaps) parcoord() parallel coordinate plot (MASS)
add.edges(), add.vertices(), delete.edges(), delete.vertices() PlotOnStaticMap() overlay plot on background image of map tile cparcoord() enhanced parallel coordinate plot (gclus)
add and delete edges and vertices (igraph) (RgoogleMaps) parallelplot() parallel coordinates plot (lattice)
neighborhood() neighborhood of graph vertices (igraph, sna) TextOnStaticMap() plot text on map (RgoogleMaps) densityplot() kernel density plot (lattice)
get.adjlist() adjacency lists for edges or vertices (igraph) Packages contour(), filled.contour() contour plot
nei(), adj(), from(), to() vertex/edge sequence indexing (igraph) levelplot(), contourplot() level plots and contour plots (lattice)
plotGoogleMaps plot spatial data as HTML map mushup over Google Maps
cliques(), largest.cliques(), maximal.cliques(), clique.number() smoothScatter() scatterplots with smoothed densities color representation;
RgoogleMaps overlay on Google map tiles in R
find cliques, ie. complete subgraphs (igraph) capable of visualizing large datasets
ggmap Spatial visualization with Google Maps and OpenStreetMap
clusters(), no.clusters() maximal connected components of a graph and sunflowerplot() a sunflower scatter plot
plotKML visualization of spatial and spatio-temporal objects in Google Earth
the number of them (igraph) assocplot() association plot
SGCS Spatial Graph based Clustering Summaries for spatial point patterns
fastgreedy.community(), spinglass.community() community detection mosaicplot() mosaic plot
spdep spatial dependence: weighting schemes, statistics and models
(igraph) matplot() plot the columns of one matrix against the columns of another
cohesive.blocks() calculate cohesive blocks (igraph) Statistics fourfoldplot() a fourfold display of a 2 2 k contingency table
induced.subgraph() create a subgraph of a graph (igraph) Summarization persp() perspective plots of surfaces over the x?y plane
mst() minimum spanning tree (igraph) cloud(), wireframe() 3d scatter plots and surfaces (lattice)
components() calculate the maximal connected components (igraph) summary() summarize data interaction.plot() two-way interaction plot
shortest paths() the shortest paths between vertices (igraph) describe() concise statistical description of data (Hmisc) iplot(), ihist(), ibar(), ipcp() interactive scatter plot, histogram, bar
%->%, %<-%, %--% edge sequence indexing (igraph) boxplot.stats() box plot statistics plot, and parallel coordinates plot (iplots)
get.edgelist() return an edge list in a two-column matrix (igraph) Analysis of Variance pdf(), postscript(), win.metafile(), jpeg(), bmp(),
read.graph(), write.graph() read and writ graphs from and to files aov() fit an analysis of variance model png(), tiff() save graphs into files of various formats
of various formats (igraph) anova() compute analysis of variance (or deviance) tables for one or more fitted gvisAnnotatedTimeLine(), gvisAreaChart(),
Packages model objects gvisBarChart(), gvisBubbleChart(),
Statistical Tests gvisCandlestickChart(), gvisColumnChart(),
igraph network analysis and visualization
gvisComboChart(), gvisGauge(), gvisGeoChart(),
sna social network analysis chisq.test() chi-squared contingency table tests and goodness-of-fit tests
gvisGeoMap(), gvisIntensityMap(),
d3Network, networkD3 creating D3 JavaScript network, tree, dendrogram, and ks.test() Kolmogorov-Smirnov tests
gvisLineChart(), gvisMap(), gvisMerge(),
Sankey graphs from R t.test() students t-test
gvisMotionChart(), gvisOrgChart(),
RNeo4j interact with a Neo4j database through R prop.test() test of equal or given proportions
gvisPieChart(), gvisScatterChart(),
statnet a set of tools for the representation, visualization, analysis and simulation binom.test() exact binomial test
gvisSteppedAreaChart(), gvisTable(),
of network data Mixed Effects Models gvisTreeMap() various interactive charts produced with the Google
egonet ego-centric measures in social network analysis
lme() fit a linear mixed-effects model (nlme) Visualisation API (googleVis)
snort social network-analysis on relational tables
nlme() fit a nonlinear mixed-effects model (nlme) gvisMerge() merge two googleVis charts into one (googleVis)
network tools to create and modify network objects
bipartite visualising bipartite networks and calculating some (ecological) indices Principal Components and Factor Analysis Packages
blockmodelinggeneralized and classical blockmodeling of valued networks princomp() principal components analysis ggplot2 an implementation of the Grammar of Graphics
diagram visualising simple graphs (networks), plotting flow diagrams prcomp() principal components analysis ggvis interactive grammar of graphics
NetCluster clustering for networks Other Functions googleVis an interface between R and the Google Visualisation API to create
NetData network data for McFarlands SNA R labs var(), cov(), cor() variance, covariance, and correlation interactive charts
NetIndices estimating network indices, including trophic structure of foodwebs density() compute kernel density estimates d3Network, networkD3 creating D3 JavaScript network, tree, dendrogram, and
in R cmdscale() Multidimensional Scaling (MDS) Sankey graphs from R
NetworkAnalysis statistical inference on populations of weighted or unweighted rCharts interactive javascript visualizations from R
networks Packages lattice a powerful high-level data visualization system, with an emphasis on mul-
tnet analysis of weighted, two-mode, and longitudinal networks nlme linear and nonlinear mixed effects models tivariate data
Spatial Data Analysis Graphics vcd visualizing categorical data
iplots interactive graphics
Functions Functions
plot() generic function for plotting
Data Manipulation
geocode() geocodes a location using Google Maps (ggmap)
plotGoogleMaps() create a plot of spatial data on Google Maps (plot- barplot(), pie(), hist() bar chart, pie chart and histogram Functions
GoogleMaps) boxplot() box-and-whisker plot transform() transform a data frame
qmap() quick map plot (ggmap) stripchart() one dimensional scatter plot scale() scaling and centering of matrix-like objects
get map() queries the Google Maps, OpenStreetMap, or Stamen Maps server dotchart() Cleveland dot plot t() matrix transpose
for a map at a certain location (ggmap) qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot aperm() array transpose
gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), coplot() conditioning plot sample() sampling
gvisMap() Google geo charts and maps (googleVis) splom() conditional scatter plot matrices (lattice) table(), tabulate(), xtabs() cross tabulation
GetMap() download a static map from the Google server (RgoogleMaps) pairs() a matrix of scatterplots stack(), unstack() stacking vectors
cpairs() enhanced scatterplot matrix (gclus)

3
split(), unsplit() divide data into groups and reassemble xlsx read, write, format Excel 2007 and Excel 97/2000/XP/2003 files big.matrix() create a standard big.matrix, which is constrained to available
reshape() reshape a data frame between wide and long format xlsReadWrite read and write Excel files RAM (bigmemory)
merge() merge two data frames; similar to database join operations WriteXLS create Excel 2003 (XLS) files from data frames read.big.matrix() create a big.matrix by reading from an ASCII file (big-
aggregate() compute summary statistics of data subsets SPARQL Use SPARQL to pose SELECT or UPDATE queries to an end-point memory)
by() apply a function to a data frame split by factors write.big.matrix() write a big.matrix to a file (bigmemory)
melt(), cast() melt and then cast data into the reshaped or aggregated
Web Data Access filebacked.big.matrix() create a file-backed big.matrix, which may ex-
form you want (reshape) Functions ceed available RAM by using hard drive space (bigmemory)
complete.cases() find complete cases, i.e., cases without missing values download.file() download a file from the Internet mwhich() expanded which-like functionality (bigmemory)
na.fail, na.omit, na.exclude, na.pass handle missing values xmlParse(), htmlParse() parse an XML or HTML file (XML) Packages
Packages userTimeline(), homeTimeline(), mentions(), ff memory-efficient storage of large data on disk and fast access functions
dplyr a fast, consistent tool for working with data frame like objects retweetsOfMe() retrieve various timelines within the Twitter uni- ffbase basic statistical functions for package ff
reshape flexibly restructure and aggregate data using melt and cast verse (twitteR) filehash a simple key-value database for handling large data
reshape2 flexibly reshape data: a reboot of the reshape package searchTwitter() a search of Twitter based on a supplied search string (twit- g.data create and maintain delayed-data packages
tidyr easily tidy data with spread and gather functions; an evolution of reshape2 teR) BufferedMatrix a matrix data storage object held in temporary files
data.table extension of data.frame for fast indexing, ordered joins, assignment, getUser(), lookupUsers() get information of Twitter users (twitteR) biglm regression for data too large to fit in memory
and grouping and list columns getFollowers(), getFollowerIDs(), getFriends(), bigmemory manage massive matrices with shared memory and memory-mapped
gdata various tools for data manipulation getFriendIDs() get a list of followers/friends or their IDs of a files
lubridate functions to work with data and time Twitter user (twitteR) biganalytics extend the bigmemory package with various analytics
stringr string operations twListToDF() convert twitteR lists to data frames (twitteR) bigtabulate table-, tapply-, and split-like functionality for matrix and
Packages big.matrix objects
Data Access twitteR an interface to the Twitter web API
Functions RCurl general network (HTTP/FTP/. . . ) client interface for R
Parallel Computing
save(), load() save and load R data objects XML reading and creating XML and HTML documents Functions
read.csv(), write.csv() import from and export to .CSV files httr tools for working with URLs and HTTP; a simplified wrapper built on top sfInit(), sfStop() initialize and stop the cluster (snowfall)
read.table(), write.table(), scan(), write() read and of RCurl sfLapply(), sfSapply(), sfApply() parallel versions of
write data MapReduce, Hadoop and Spark lapply(), sapply(), apply() (snowfall)
read.xlsx(), write.xlsx() read and write Excel files (xlsx) foreach(...) %dopar% looping in parallel (foreach)
read.fwf() read fixed width format files Functions registerDoSEQ(), registerDoSNOW(), registerDoMC() register respec-
write.matrix() write a matrix or data frame (MASS) mapreduce() define and execute a MapReduce job (rmr2 ) tively the sequential, SNOW and multicore parallel backend with the
readLines(), writeLines() read/write text lines from/to a connection, keyval() create a key-value object (rmr2 ) foreach package (foreach, doSNOW, doMC)
such as a text file from.dfs(), to.dfs() read/write R objects from/to file system (rmr2 ) Packages
sqlQuery() submit an SQL query to an ODBC database (RODBC) hb.get(), hb.scan(), hb.get.data.frame() read HBase tables (rhbase )
sqlFetch() read a table from an ODBC database (RODBC) snowfall usability wrapper around snow for easier development of parallel R
hb.insert(), hb.insert.data.frame() write to HBase tables (rhbase )
sqlSave(), sqlUpdate() write or update a table in an ODBC database programs
hb.delete() delete from HBase tables (rhbase )
(RODBC) snow simple parallel computing in R
Packages multicore parallel processing of R code on machines with multiple cores or
sqlColumns() enquire about the column structure of tables (RODBC)
rmr2 perform data analysis with R via MapReduce on a Hadoop cluster CPUs
sqlTables() list tables on an ODBC connection (RODBC)
rhdfs connect to the Hadoop Distributed File System (HDFS) snowFT extension of snow supporting fault tolerant and reproducible applica-
odbcConnect(), odbcClose(), odbcCloseAll() open/close con-
rhbase connect to the NoSQL HBase database tions, and easy-to-use parallel programming
nections to ODBC databases (RODBC)
Rhipe R and Hadoop Integrated Processing Environment Rmpi interface (Wrapper) to MPI (Message-Passing Interface)
dbSendQuery execute an SQL statement on a given database connection (DBI)
SparkR a light-weight frontend to use Apache Spark from R rpvm R interface to PVM (Parallel Virtual Machine)
dbConnect(), dbDisconnect() create/close a connection to a DBMS
RHive distributed computing via HIVE query nws provide coordination and parallel execution facilities
(DBI)
Segue Parallel R in the cloud using Amazons Elastic Map Reduce (EMR) engine foreach foreach looping construct for R
Packages HadoopStreaming Utilities for using R scripts in Hadoop streaming doMC foreach parallel adaptor for the multicore package
RODBC ODBC database access hive distributed computing via the MapReduce paradigm doSNOW foreach parallel adaptor for the snow package
foreign read and write data in other formats, such as Minitab, S, SAS, SPSS, rHadoopClient Hadoop client interface for R doMPI foreach parallel adaptor for the Rmpi package
Stata, Systat, . . . doParallel foreach parallel adaptor for the multicore package
sqldf perform SQL selects on R data frames Large Data doRNG generic reproducible parallel backend for foreach Loops
DBI a database interface (DBI) between R and relational DBMS Functions GridR execute functions on remote hosts, clusters or grids
RMySQL interface to the MySQL database as.ffdf() coerce a dataframe to an ffdf (ff ) fork R functions for handling multiple processes
RJDBC access to databases through the JDBC interface
RSQLite SQLite interface for R
read.table.ffdf(), read.csv.ffdf() read data from a flat file to an ffdf Interface to Weka
object (ff )
ROracle Oracle database interface (DBI) driver Package RWeka is an R interface to Weka, and enables to use the following Weka
write.table.ffdf(), write.csv.ffdf() write an ffdf object to a flat file
RpgSQL DBI/RJDBC interface to PostgreSQL database functions in R.
(ff )
RODM interface to Oracle Data Mining ffdfappend() append a dataframe or an ffdf to an existing ffdf (ff )

4
Association rules: RPMG graphical user interface (GUI) for interactive R analysis sessions
Apriori(), Tertius() Red-R An open source visual programming GUI interface for R
Regression and classification: R AnalyticFlow a software which enables data analysis by drawing analysis
LinearRegression(), Logistic(), SMO() flowcharts
Lazy classifiers: latticist a graphical user interface for exploratory visualisation
IBk(), LBR()
Meta classifiers:
Other R Reference Cards
AdaBoostM1(), Bagging(), LogitBoost(), MultiBoostAB(), R Reference Card, by Tom Short
Stacking(), http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/
CostSensitiveClassifier() R-refcard.pdf or
Rule classifiers: http://cran.r-project.org/doc/contrib/Short-refcard.pdf
JRip(), M5Rules(), OneR(), PART() R Reference Card, by Jonathan Baron
Regression and classification trees: http://cran.r-project.org/doc/contrib/refcard.pdf
J48(), LMT(), M5P(), DecisionStump() R Functions for Regression Analysis, by Vito Ricci
Clustering: http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.
Cobweb(), FarthestFirst(), SimpleKMeans(), XMeans(), pdf
DBScan() R Functions for Time Series Analysis, by Vito Ricci
Filters: http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
Normalize(), Discretize() RDataMining Books
Word stemmers:
IteratedLovinsStemmer(), LovinsStemmer() R and Data Mining: Examples and Case Studies
Tokenizers: introduces into using R for data mining with examples and case studies.
AlphabeticTokenizer(), NGramTokenizer(), WordTokenizer() http://www.rdatamining.com/books/rdm
Data Mining Applications with R
Interface to Other Programming Languages presents 15 real-world applications on data mining with R.
http://www.rdatamining.com/books/dmar
Functions
.jcall() call a Java method (rJava) RDataMining Website, Group, Twitter & Package
.jnew() create a new Java object (rJava) RDataMining Website
.jinit() initialize the Java Virtual Machine (JVM) (rJava) http://www.rdatamining.com
.jaddClassPath() adds directories or JAR files to the class path (rJava) http://www2.rdatamining.com
Packages RDataMining Group on LinkedIn (20,000+ members)
rJava low-level R to Java interface http://group.rdatamining.com or
rPython call Python from R https://www.linkedin.com/groups/4066593
RDataMining on Twitter (2,500+ followers)
Generating Documents and Reports @RDataMining
Functions RDataMining Project on R-Forge
http://www.rdatamining.com/package
Sweave() mixing text and R/S code for automatic report generation
http://package.rdatamining.com
xtable() export tables to LaTeX or HTML (xtable)
Packages Comments & Feedback
knitr a general-purpose package for dynamic report generation in R If you have any comments, or would like to suggest any relevant R pack-
xtable export tables to LaTeX or HTML ages/functions, please feel free to email me <yanchang@rdatamining.com>.
R2HTML making HTML reports Thanks.
R2PPT generating Microsoft PowerPoint presentations If you have any questions on using R for data mining, please post them to the
RDataMining Group on LinkedIn at http://group.rdatamining.com.
Building GUIs and Web Applications
shiny web application framework for R
svDialogs dialog boxes
gWidgets a toolkit-independent API for building interactive GUIs
R Editors/GUIs
RStudio a free integrated development environment (IDE) for R
Tinn-R a free GUI for R language and environment
rattle graphical user interface for data mining in R
Rpad workbook-style, web-based interface to R

Potrebbero piacerti anche