1
$\begingroup$

Background

I am trying to extract data from scientific publications. Sometimes an experiment can test two factors, e.g. a standard two-way factorial ANOVA.

Call these factors $A$ and $B$. If factor $A$ has two levels and $B$ has three, there are six total treatments.

If the effects of $A$ and $B$ are significant, but there is no interaction, only the 'Main' effects might be presented, e.g. five results, one for each level of $A$ and one for each level of $B$, averaged across all of the levels of the other factor.

Here is an example from Ma 2001 Table 2, in which $A$ would be the row spacing and $B$ would be the nitrogen rate.

enter image description here

Thus,

$$7577 = \frac{X_{A_{20},B_{0}} + X_{A_{20},B_{112}} + X_{A_{20},B_{224}}} {3}$$

$$9186 = \frac{X_{A_{80},B_{0}} + X_{A_{80},B_{112}} + X_{A_{80},B_{224}}} {3}$$

$$3706 = \frac{X_{A_{20},B_{0}} + X_{A_{80},B_{0}}} {2}$$ $$9402 = \frac{X_{A_{20},B_{112}} + X_{A_{80},B_{112}}} {2}$$ $$12038 = \frac{X_{A_{20},B_{224}} + X_{A_{80},B_{224}}} {2}$$

Question

Is it possible to calculate the means of each of the six treatments $X_{A,B}$, for $A\in[20,80]$ by $B\in[0,112,224]$ from these results?

  • 0
    These are actually only four independent equations, because thrice the sum of the first two must equal twice the sum of the last three (it's just the total of all six variables).2011-01-25

3 Answers 3

3

You can do it if you make some assumption to reduce the number of unknowns to five. You are saying you have an array

$$\begin{array} {ccc} & 20 & 80 \\ 0 & a & b \\ 112 & c & d \\ 224 & e & f \end{array}$$

where $a$ through $f$ are what you want to solve for. If the effects are independent and additive, you would expect $b-a=d-c=f-e$, $e-c=f-d$, and $c-a=d-b$. These reduce the data to only three values, which you can check for consistency. But without at least one more relation, you will get a one-dimensional continuum of solutions.

It sounds like you believe 3706 is some sort of weighted average of $a$ and $b$ and similarly for the other entries. Is that right?

  • 0
    yes I have added the equations above for clarification.2011-01-24
  • 0
    @David: then you need one more equation and you'll be fine. Without it, no.2011-01-24
  • 0
    what kind of assumptions might I make, would it help if I assumed some statistical model?2011-01-25
  • 0
    @David: either that the effects are independent, or that the nitrogen is linear so the effect of 224 is twice that of 112. The independent case I mentioned above, for nitrogen you would expect e-c=c-a and f-d=d-b.2011-01-25
  • 0
    thanks for helping me work through this.2011-01-26
2

The general rule is that $n$ equations allows you to solve for $n$ unknowns. So I don't think you'll be able to recover each of the original 6 data points. The best you can do is produce a set of constraints that, given any one of the 6 data points (or a relationship between them that isn't redundant with what you already have), would allow you to find the remaining 5.

  • 0
    thanks for the rule. hmpfff. Perhaps can I reduce the unknowns if I know the relations among them? I have written out the equations above.2011-01-24
  • 0
    That is the general rule, but there are exceptions. for example, I can have 2 unknowns and 1 equation. $x^2+y^2=0$. given that $x$ and $y$ are real, obviously $x=y=0$.2011-01-24
  • 0
    @picakhu: of course, but I declined to discuss the exceptions here because as far as I can remember, special cases like that don't occur for linear equations like the ones in this question.2011-01-24
1

[Update Note] I saw the related question at mathoverflow. There it seemed we were dealing with a frequencies-table, so I repeated that scheme here. But now I see the question is focused on means in a two-factor anova. I'll see, whether that two concepts can be interchanged here; for instance a 1:1-reference should only be possible if the coefficients under treatments (means?) are based on the same number of observations. Possibly it is better to delete this answer later [end note]

Here is a solution. I computed the "expected frequencies" based on your values, where I compensated the * .../3* and the * .../2*-divisions. Also I corrected 9186 to 9817 to make the totals consistent.
$ \begin{array} {rrrrrrr} & & B: '0' & '112' & '224' &| & (all) & \\\ --- &+&----&-----&-----&+&---& &\\\ A:'20' &|& 3350.08& 8499.04 & 10881.88 &|& 22731 &/3 = 7577 \\\ '80' &|& 4061.92& 10304.96 & 13194.12 &|& 27561 & /3 = 9187 \\\ --- &+&----&-----&-----&+&--- & \\\ (all) & & 7412& 18804 & 24076 &|& 50292 & \\\ & & /2=3706& /2=9402 & /2=12038 & & & \\\ \end{array} $

  • 0
    thanks for the answer. I got close (enough) using iterative proportional fitting, but I don't quite understand if this is a robust solution, or what assumptions are required to get to this rather than other solutions (The question on MO has been closed as 'too localized'.)2011-01-25
  • 0
    @David: The last example in Wikipedia just reproduces the "expected-frequencies"-method. However, from the given 2x2-table example I do not recognize whether this is **always** the case (then the solution would be simple, but I doubt) Possibly any meaningful assumption for the measures in the treatment-groups should assume explicitely a chi-square=zero for the frequencies.2011-01-25