Sei sulla pagina 1di 12

How to Run PAM analysis using R in combination

with Coffalyser.NET program


To classify a sample as BRCA1-like or non-BRCA1-like a classifier in the statistical programming language R
can be used. This classifier was developed using Prediction analysis for microarrays (PAM) () (Tibshirani R et
al. 2002, PNAS, 99:6567-72).
This is an approach to cancer class prediction from gene expression profiling, based on an enhancement of
the simple nearest prototype (centroid) classifier. The prototypes shrink and hence obtain a classifier that is
often more accurate than competing methods. The method of nearest shrunken centroids identifies
subsets of genes that best characterize each class. The technique is general and can be used in many other
classification problems. More information about this method can be found at:
http://statweb.stanford.edu/~tibs/SAM/

Contents
How to Run PAM analysis using R in combination with Coffalyser.NET program............................................. 1
Coffalyser.Net - Support............................................................................................................................. 2
Contact us.................................................................................................................................................. 2
Using R scripts for calling your BRCA1ness classification with P376 ............................................................ 3
Step 1: Collect all relevant data / programs ............................................................................................ 3
Step 2: Windows regional settings ......................................................................................................... 3
Step 3: Install Coffalyser.Net and install R .............................................................................................. 3
Step 4: Install pamr package in R ............................................................................................................ 3
Create your training data file ..................................................................................................................... 4
Formats of export files ........................................................................................................................... 4
Step 5. Analyze your training data set .................................................................................................... 5
Step 6. Open the experiment explorer and export data in R format........................................................ 6
Step 7. Add the classification to the txt training data.............................................................................. 7
Calling your unknown data......................................................................................................................... 9
Step 8. Analyze your test data ................................................................................................................ 9
Step 9. Call your data in R..................................................................................................................... 11
Output file ........................................................................................................................................... 12

Coffalyser.Net - Support

Coffalyser.net Home Wordpress


http://coffalyser.wordpress.com/

YouTube (flash instruction videos)


http://www.youtube.com/user/Coffalyser

Registration page, click on login on the left side:


http://www.mlpa.com

Wiki (our old home for support material)


http://wiki.coffalyser.net

Publication with regard to analysis methods (open book)


http://www.intechopen.com/books/modern-approaches-to-quality-control/analysis-of-mlpa-data-using-novelsoftware-coffalyser-net-by-mrc-holland

MRC-Holland Main
http://www.mlpa.com

Download R for your operating system at:


http://cran.r-project.org/bin/windows/base/
NOTE: we only tested version 2.15.1

Contact us

MRC-Holland provides free support to all Coffalyser.Net users.


For general MLPA related questions you can send an email to info@mlpa.com
For Coffalyser.Net related questions you can send an email to support@coffalyser.net

Using R scripts for calling your BRCA1ness classification with P376


Step 1: Collect all relevant data / programs
You need to have:

The last version of Coffalyser.Net v.140425.1321 (www.mlpa.com)


The R program version (versions 3.1.1, 3.0.3 and 2.15.1 had been tested at MRC-Holland)
(http://cran.r-project.org/bin/windows/base/old/)
The training data in ABIF format (if not yet provided with this manual email to info@mlpa.com)
The unknown data you are planning to call for BRCAness-like in ABIF format

Step 2: Windows regional settings


Please note that in our findings the method did not work unless your regional settings have a dot as the
decimal separation sign and a comma as the thousand separation sign. You can adjust these settings under:
Start Menu  Configuration Screen  Clock, Language and Region  Region and Language  On the tab
Formats  Additional Settings  Customize Format
Step 3: Install Coffalyser.Net and install R
Install Coffalyser.Net using the installation manual provided with the setup files. Also install one of in step 1
mentioned R versions for Windows according to the instruction on the screen.
Step 4: Install pamr package in R
You will need to install the R-package for PAM training and calling. From the menu bar click on Packages
and then select Install package(s).

Next you will need to select a mirror. Select the closest mirror to your location. Now scroll through the list of
packages and select pamr and click on OK.

Create your training data file


Before you can classify your unknown data you will first need to make a training data set using a set of
samples of which the type is already known. MRC-Holland can provide a set of samples in ABIF format that
includes reference samples and test samples. Within the selection of test samples, there are sporadic tumors
and BRCA1ness tumors. These samples were analyzed using the P376 lot B2-0911 MLPA mix and the
fragment products were separated on an ABI-3130 XL genetic analyzer with a GS-500 LIZ size marker.
This training set can also be used when test samples are analysed with P376 lot B3-0414.
Formats of export files
Coffalyser.Net has a special export function that will export files to a format that can directly be accepted by
R. If you do not wish to use the export function, then please consult the manual of R and PAM in order to
create input files in the correct format.
Please note: Both data types (training and Unknown data) need to be normalised in the same way. Our
recommendation is to use Coffalyser.Net to normalise your data. However, if you are using the global
normalisation method described by the NKI (see detailed instructions Lips E. et al. 2011 Breast Cancer Res.
13(5):R107), you need to normalise your unknown data set in the very same way.
Mosaicism: In case your tumor samples contain normal cells then better results may be obtained by
changing the arbitrary borders to 0.85-1.2.

Step 5. Analyze your training data set


Open Coffalyser.Net and analyze the training data set provided by MRC-Holland according to the analysis
manual that can be found on the Coffalyser.Net home page. Use the reference samples and No DNA samples
as provided below.
Reference samples:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

P376-B2-0911-NEW MB-NKI-REF1-CHE-1
P376-B2-0911-NEW MB-NKI-REF2-CHE-1
P376-B2-0911-NEW MB-NKI-REF3-CHE-2
P376-B2-0911-NEW MB-NKI-REF4-CHE-3
P376-B2-0911-NEW MB-NKI-REF5-CHE-3
P376-B2-0911-NEW MB-NKI-REF6-CHE-4
P376-B2-0911-NEW MB-NKI-REF7-CHE-5
P376-B2-0911-NEW MB-NKI-REF8-CHE-6
P376-B2-0911-NEW MB-NKI-REF9-CHE-7
P376-B2-0911-NEW MB-NKI-REF10-CHE-7
P376-B2-0911-NEW MB-NKI-REF11-CHE-8
P376-B2-0911-NEW MB-NKI-REF12-CHE-9
P376-B2-0911-NEW MB-NKI-REF13-CHE-10
P376-B2-0911-NEW MB-NKI-REF14-CHE-10
P376-B2-0911-NEW MB-NKI-REF15-CHE-11
P376-B2-0911-NEW MB-NKI-REF15-CHE-11-2

No DNA control samples:


1. P376-B2-0911-NEW MB-NKI-NODNA-CHE-5
2. P376-B2-0911-NEW MB-NKI-NODNA-CHE-10
Note: during the analysis we recommend to only include samples that have 100% score on the FMRS. So
only include samples that have 4 bars for the FMRS score in the comparative analysis. Also please note
that if you change analysis settings that you are consequent with these changes for both your test samples
and the training data set!

Step 6. Open the experiment explorer and export data in R format


Open the experiment results from the experiment analysis form.

In the Comparative Analysis Experiment Explorer use the key combination of: Ctrl + Shift + Alt + R, this will
allow you to save the grid data to a specific txt file format that may be used for R.

Do not make the R-script file yet, you will need to use this option for your test data later.

Step 7. Add the classification to the txt training data


Open the trainings set data in Excel. In the first row you will see the sample names. You can make the names
easier recognizable by replacing P376-B2-0911-NEW MB-NKI- for nothing ( ) in the entire row. You can do
this by selecting the entire row and using the key combination Ctrl-F.

Now select all the columns that contain the reference samples (noted with REF) and remove these
columns from the worksheet. Also remove the column with the sample names: N20120-CHE-10, N16986CHE-10, B1022-CHE-6 and C020-CHE-7.

Now we need to add the classification for each sample in row 2, right underneath the sample name. You can
find the classification of all the samples of the training set in the table below. Please be sure to use the exact
classification names for all samples. If you accidently add a single symbol then this sample will be seen as a
new group.

Table 1: Classification of all samples in the training data


Sporadic_Like BRCA1_Like
2058

2131

2124

2165

2134

2224

2151

2254

2169

2312

2175

2355

2182

B1007

2188

B1035

2195

B1045

2204

B1049

2216

B1058

2227

B1061

2232

B1064

2234

B1065

2276

C119

2278

C121

2295

T4147

2298

T6701

2350
C035
C036
C044
C048
C065
C068
C127
C128
C129
C130
Now save the changes you made to the grid, be sure that you KEEP the format: Text (Tab delimited)!
8

Calling your unknown data


Step 8. Analyze your test data
Now analyze your unknown test data and open the Comparative Analysis Experiment Explorer. While on
the tab with the overview use the keyboard combination of "Ctrl + Alt + Shift + R". This will generate a txt file
suitable for importing in R - program. Save the data at the same folder location as your test data!

When asked to create an R-script, answer Yes.

Now you will be asked to select the file that you want to use as training data. Select the training data txt file
where you have just added the classification and click on Open.

Please note: the R codes needed to train and make the calls are now copied to your clipboard. This is done
so that you do not need to type in all the codes that direct the program to all the relevant file locations. If
you want to use this option, you will need to open the R program directly after and paste the content of your
clipboard in the R console as explained in the next step. The R codes will also be saved in a txt file that will
be saved at the same location.

10

Step 9. Call your data in R


Open RGui and paste the content of the clipboard in the R Console. Depending on the locations of the files
your R code will look something like this:
thesource("http://bioconductor.org/biocLite.R")
biocLite("pamr")
library (pamr)
pamrB1excel<-pamr.from.excel('C:/PAM/p376 0911 trainingset.txt', 52, sample.labels=TRUE,
batch.labels= FALSE)
pamr_b1_vs_spor.train <- pamr.train(pamrB1excel)
pamrB1exceltest<-pamr.from.excel('C:/PAM/p376 0911 GM.txt', 27, sample.labels=TRUE,
batch.labels= FALSE)
test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0)
table( pamrB1exceltest$y,test_predict)
test_predict<-pamr.predict(pamr_b1_vs_spor.train, pamrB1exceltest$x, threshold=0, type=
"posterior")
test_predict
data.frame(SampleID=pamrB1exceltest$samplelabels, test_predict)
write.table(data.frame(sample=pamrB1exceltest$samplelabels, test_predict), sep="\t",
row.names=F, file='C:/PAM/OUTPUT FILE.txt')
---------------------------------------------------------------------------------------------In case you want to type in the R codes yourself, you will need to replace the file locations with the correct
information. Please note: depending on the version that you used for installation the PAM code may works
directly. It is also possible that you receive an error message indicating a missing package.

11

If you receive the error message:


Error: could not find function "pamr.from.excel"; pamr_b1_vs_spor.train <- pamr.train(pamrB1excel); Error:
could not find function "pamr.train"; Error: could not find function "pamr.from.excel"
Then you probably miss the package for R, please see step 4 on how to install the pamr package.
Output file
Your output will look something like this. Please first use an experiment with samples that are known and
classified to validate the method works! This output will also be available as a file in the same folder with all
the other data. This file will be called: OUTPUT FILE.txt, calls in these files are shown as P-values. The cut-off
value to classify a sample as BRCA1-like should be set at 0.5. Below this score, a sample should be classified
as non-BRCA1-like (Lips E. et al. 2011 Breast Cancer Res. 13(5):R107).

12