Here are a few notes to get you started in R. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Help %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Type help(XXX) to get help on command XXX Type help.search("XXX") to get help on topic XXX (not the quotes) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% I/O %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Read data using scan convert to a matrix with "matrix" m <- matrix( scan ("mfile"), ncol = 5, byrow=TRUE) - convert data to a matrix To save and restore in R format save(x) load(x) To write in readable format write(as.matrix(zz), file = "junk") %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% REGRESSION %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% I use glm() for regression ("family" parameter specifies the type of regression, logistic ("binomial"), linear ("gaussian"), etc). To do standard linear regresion, you can use "ls". For step-wise I use "step" where you can specify "forward", "backward" or "both", the scope, i.e. upper and lower "model boundaries", and penalty "k" (for BIC it is log(N)). To find out what one knows about the result "myresult" which is a regression object: attributes(myresult) myresult$dof.residual WARNING stepwise regression seems to give errors when a zero row is tried names() to see what fields it has $ to extract the specified field %% basic regression - example (executable R code) nrows <- 100 ncols <- 1000 x <- matrix (rnorm(nrows*ncols), ncol = ncols) y <- apply(x[,1:3],1,mean) + 0.05 * rnorm(nrows) traindata <- data.frame(x,y) glmtest <- glm(y ~ 1, gaussian(link = identity ), data= traindata) summary(step(glmtest, direction="forward", scope = list(lower = ~ 1, upper = ~ 1+X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15), k = log(nrows))) summary(step(glmtest, direction="forward", scope = list(lower = ~ 1, upper = ~ 1+ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19 +X20+X21+X22+X23+X24+X25+X26+X27+X28+X29+X30+X31+X32+X33+X34+X35+X36 +X37+X38+X39+X40+X41+X42+X43+X44+X45+X46+X47+X48+X49+X50+X51+X52+X53 +X54+X55+X56+X57+X58+X59+X60+X61+X62+X63+X64+X65+X66+X67+X68+X69+X70 +X71+X72+X73+X74+X75+X76+X77+X78+X79+X80+X81+X82+X83+X84+X85+X86+X87 +X88+X89+X90+X91+X92+X93+X94+X95+X96+X97+X98+X99+X100), k = log(ncols))) above must be on a single line; no line breaks log(nrows) gives 7 features in the model log(ncols) gives 3 features (correctly)