Tuesday, 10 September 2013

r sum several colmns by another column

r sum several colmns by another column

I have a 39 column (with upward of 100000 rows) data frame whose last ten
columns looks like that (The rest of the columns do not concern my
question)
H3K27me3_gross_bin H3K4me3_gross_bin H3K4me1_gross_bin UtoP UtoM UPU UPP
UPM UMU UMP UMM
cg00000029 3 3 6 1
1 0 0 0 0 0 0
cg00000321 6 1 5 1
0 0 1 0 0 0 0
cg00000363 6 1 1 1
0 1 0 0 0 0 0
cg00000622 1 2 1 0
0 0 0 0 0 0 0
cg00000714 2 5 6 1
0 0 0 0 0 0 0
cg00000734 2 6 2 0
0 0 0 0 0 0 0
I want to create a matrix that will:
a) count the number of rows in which the value columns UPU, UPP or UPM is
1 by each of the first three columns (H3K27me3_gross_bin,
H3K4me3_gross_bin, H3K4me1_gross_bin) b) sum each row of the columns UPU,
UPP, UPM by the first three columns
I came up with this incredibly cumbersome way of doing this:
UtoPFrac<-seq(6)
UtoPTotEvents<-seq(6)
for (j in 1:3){
y<-df[,28+j]
for (i in 1:3){
UtoPFrac<-cbind(UtoPFrac,tapply(df[which(is.na(y)==FALSE),33+i],y[which(is.na(y)==FALSE)],
function(x) length(which(x==1))))
}
}
UtoPFrac<-UtoPFrac[,2:10]
UtoPEvents<-cbind(rowSums(UtoPFrac[,1:3]),rowSums(UtoPFrac[,4:6]),rowSums(UtoPFrac[,7:9]))
I am certian there is a more elegent way of doing this, probably by using
aggregate() or ddply(), but was unable to get this working. I will
apprciate any help doing this more efficenly
Thanks in advance

No comments:

Post a Comment