3
$\begingroup$

First time poster in the math section (a few posts in the stats section) and I am looking for clarification on a variable query that I have. Basically I enjoy sports and enjoy putting a mathematical answer (where I can as I don't have a great background knowledge) to a problem so I can make an informed opinion on an event.

My current interest is to try and apply linear weights through regression analysis to goals scored in the English Premiership (and to then be able to apply the weights to making projections based on current data using Monte Carlo style analysis).

I have pulled three complete years of data from ESPN (to maintain consistency in the data source) and then have broken them down. I am treating goals scored as the dependent variable and was treating all other data collated as independent (e.g. things such as shots, shots on goal etc.). After breaking down the data that I had I found that I could get my best r2 value (in excess of 0.9), when adding a shooting percentage variable in (e.g. goals scored/shots on goal * 100 to get say 36 instead of 0.36).

My question is; can I treat shooting% as an independent variable though as it is essentially a function of goals scored and shots on goal? Ideally I would like to treat it as independent as it gives me a cleaner result based on the data that I can obtain and do feel it is a reflection of accuracy of the person scoring, but my gut feeling is that it is dependent on the other two variables? I would be grateful to get an opinion on this so that I can then go away and try obtain more data if my logic isn't suitable.

Many thanks,

  • 1
    It does not really make sense to ask whether one particular variable is independent or not. Independence is a relation _between two or more variables_, so you need to specify what you want your variable to be independent _from_.2012-07-29
  • 0
    Also, most of the variables you sketch are probably not independent from each other. I presume that "shots on goal" is always less than or equal to "shots"? Such a relation implies a dependency between the two variables.2012-07-29
  • 0
    Yes that would be correct and I didn't think of that e.g. with Wayne Rooney of Manchester United over the last couple of seasons when he has played in excess of 30 games, of his total shots less, only between 40-50% have been on target and of the ones on target only 38-44% have resulted in goals. Of the type of data that was easily available I couldn't see any other of trying to derive linear weights for goals scored (but had concerns about variables being independent). I take it the variables do have to be independent of the dependent variable to derive linear weights?2012-07-29

2 Answers 2