First time poster in the math section (a few posts in the stats section) and I am looking for clarification on a variable query that I have. Basically I enjoy sports and enjoy putting a mathematical answer (where I can as I don't have a great background knowledge) to a problem so I can make an informed opinion on an event.
My current interest is to try and apply linear weights through regression analysis to goals scored in the English Premiership (and to then be able to apply the weights to making projections based on current data using Monte Carlo style analysis).
I have pulled three complete years of data from ESPN (to maintain consistency in the data source) and then have broken them down. I am treating goals scored as the dependent variable and was treating all other data collated as independent (e.g. things such as shots, shots on goal etc.). After breaking down the data that I had I found that I could get my best r2 value (in excess of 0.9), when adding a shooting percentage variable in (e.g. goals scored/shots on goal * 100 to get say 36 instead of 0.36).
My question is; can I treat shooting% as an independent variable though as it is essentially a function of goals scored and shots on goal? Ideally I would like to treat it as independent as it gives me a cleaner result based on the data that I can obtain and do feel it is a reflection of accuracy of the person scoring, but my gut feeling is that it is dependent on the other two variables? I would be grateful to get an opinion on this so that I can then go away and try obtain more data if my logic isn't suitable.
Many thanks,