I'm writing an app to help facilitate some research, and part of this involves doing some statistical calculations. Right now, the researchers are using a program called SPSS. Part of the output that they care about looks like this:

They're really only concerned about the `F`

and `Sig.`

values. My problem is that I have no background in statistics, and I can't figure out what the tests are called, or how to calculate them.

I thought the `F`

value might be the result of the F-test, but after following the steps given on Wikipedia, I got a result that was different from what `SPSS`

gives.

This website might help you out a bit more. Also this one.

I'm working from a fairly rusty memory of a statistics course, but here goes nothing:

When you're doing analysis of variance (ANOVA), you actually calculate the F statistic as the ratio from the mean-square variances "between the groups" and the mean-square variances "within the groups". The second link above seems pretty good for this calculation.

This makes the F statistic measure exactly how powerful your model is, because the "between the groups" variance is explanatory power, and "within the groups" variance is random error. High F implies a highly significant model.

As in many statistical operations, you back-determine Sig. using the F statistic. Here's where your Wikipedia information comes in slightly handy. What you want to do is - using the degrees of freedom given to you by SPSS - find the proper P value at which an F table will give you the F statistic you calculated. The P value where this happens [F(table) = F(calculated)] is the significance.

Conceptually, a lower significance value shows a very strong ability to reject the null hypothesis (which for these purposes means to determine your model has explanatory power).

Sorry to any math folks if any of this is wrong. I'll be checking back to make edits!!!

Good luck to you. Stats is fun, just maybe not this part. =)

I assume from your question that your research colleagues want to automate the process by which certain statistical analyses are performed (i.e., they want to batch process data sets). You have two options:

1) SPSS is now scriptable through python (as of version 15) - go to spss.com and search for python. You can write python scripts to automate data analyses and extract key values from pivot tables, and then process the answers any way you like. This has the virtue of allowing an exact comparison between the results from your python script and the hand-calculated efforts in SPSS of your collaborators. Thus you won't have to really know any statistics to do this work (which is a key advantage)

2) You could do this in R, a free statistics environment, which could probably be scripted. This has the disadvantage that you will have to learn statistics to ensure that you are doing it correctly.

Statistics is hard :-). After a year of reading and re-reading books and papers and can only say with confidence that I understand the very basics of it.

You might wish to investigate ready-made libraries for whichever programming language you are using, because they are many gotcha's in math in general and statistics in particular (rounding errors being an obvious example).

As an example you could take a look at the R project, which is both an interactive environment and a library which you can use from your C++ code, distributed under the GPL (ie if you are using it only internally and publishing only the results, you don't need to open your code).

In short: don't do this by hand, link/use existing software. And sain_grocen's answer is incorrect. :(

These are all tests for significance of parameter estimates that are typically used in Multivariate response Multiple Regressions. These would not be simple things to do outside of a statistical programming environment. I would suggest either getting the output from a pre-existing statistical program, or using one that you can link to and use that code.

I'm afraid that the first answer (sain_grocen's) will lead you down the wrong path. His explanation is likely of a special case of what you are actually dealing with. The anova explained in his links is for a single variate response, in a balanced design. These aren't the F statistics you are seeing. The names in your output (Pillai's Trace, Hotelling's Trace,...) are some of the available multivariate versions. They have F distributions under certain assumptions. I can't explain a text books worth of material here, I would advise you to start by looking at "Applied Multivariate Statistical Analysis" by Johnson and Wichern

Can you explain more why SPSS itself isn't a fine solution to the problem? Is it that it generates pivot tables as output that are hard to manipulate? Is it the cost of the program?

F-statistics can arise from any number of particular tests. The F is just a distribution (loosely: a description of the "frequencies" of groups of values), like a Normal (Gaussian), or Uniform. In general they arise from ratios of variances. Opinion: many statisticians (myself included), find F-based tests to be unstable (jargon: non-

robust).The particular output statistics (Pillai's trace, etc.) suggest that the original analysis is a MANOVA example, which as other posters describe is a complicated, and hard to get right procedure.

I'm guess also that, based on the MANOVA, and the use of SPSS, this is a psychology or sociology project... if not please enlighten. It might be that other, simpler models might actually be easier to understand and more repeatable. Consult your local university statistical consulting group, if you have one.

Good luck!

Here's an explanation of MANOVA ouptput, from a very good site on statistics and on SPSS:

Output with explanation: http://faculty.chass.ncsu.edu/garson/PA765/manospss.htm

How and why to do MANOVA or multivariate GLM: (same path as above, but terminating in '/manova.htm')

Writing software from scratch to calculate these outputs would be both lengthy and difficult; there's lots of numerical problems and matrix inversions to do.

As Henry said, use Python scripts, or R. I'd suggest working with somebody who knows SPSS if scripting. In addition, SPSS itself is capable of exporting the output tables to files using something called OMS. A script within SPSS can do this.

Find out who in your research group knows SPSS and work with them.