4
$\begingroup$

This is my first question here.

As I'm not a matematician I thought I'd ask here for advice how to approach something I'm working on as a hobby project.

A bit of context

Let's say there is a collection of items with a description of features and a price. Imagine a list of cars and prices. All cars have a list of features, e.g. engine size, color, horse power, model, year etc. For each make, something like this:

Ford: V8, green, manual, 200hp, 2007, $200 V6, red, automatic, 140hp, 2010, $300 V6, blue, manual, 140hp, 2005, $100 ... 

Going even further, the list of cars with prices is published with some time-interval which means we have access to historical price data. Might not always include exactly the same cars.

Problem

I would like to understand how to model prices for any car based on this base information, most importantly cars not in the initial list.

Ford, v6, red, automatic, 130hp, 2009 

For the above car, it's almost the same as one in the list, just slightly different in horse power and year. To price this, what is needed?

What I'm looking for is something practical and simple, but I would also like to hear about more complex approaches how to model something like this.

What I've tried

Here is what I've been experimenting with so far:

1) using historical data to lookup car X. If not found, no price. This is of course very limited and one can only use this in combination with some time decay to alter prices for known cars over time.

2) using a car feature weighting scheme together with a priced sample car. Basically that there is a base price and features just alter that with some factor. Based on this any car's price is derived.

The first proved to be not enough and the second proved to not be always correct and I might not have had the best approach to using the weights. This also seems to be a bit heavy on maintaining weights, so that's why I thought maybe there is some way to use the historical data as statistics in some way to get weights or to get something else. I just don't know where to start.

Any suggestions how a problem like this could be approached? All ideas are more than welcome.

Thanks a lot in advance and looking forward to reading your suggestions!

UPDATE:

Thanks a lot for the useful suggestions so far!

I just want to add that as a beginner like me, it would be good to also consider the aspects what it takes to set things up. What about the following aspects:

  • integrate into some software project I have. Either by using existing libraries or writing algorithm myself.
  • fast recalculation when new historical data comes in.
  • 0
    @Harry Stern: thanks for the suggestion. I was just hoping that, as the data set is rather large, there is something which can give this to avoid manual checking. Ideally I'm looking for something which I can add prices into which I can query for prices to all kinds of cars.2011-01-12

4 Answers 4

3

Probably, you are interested in multiple-regression. It seems that excel can also help you for this purpose. Quoting from the second link: "For example, suppose you want to project the appropriate price for a house in your area based on square footage, number of bathrooms, lot size, and age. Using a multiple regression formula, you can estimate a price, based on a database of information gathered from existing houses."

  • 0
    @Shai Covo: thanks for the links. Now after getting a picture I realize that several suggestions are pretty much the same. Makes it difficult to choose whose answer to accept.2011-01-13
3

This is a typical example of "hedonic pricing", used in applied empirical economics: this is a paper with a good review of the theory (JPE papers are renowned), and this is a closely related application to the contribution of individual camcorder characteristics in price.

  • 1
    @murrekatt: Least squares estimation is a method to perform regression analysis, i.e. a method to bring a model to the data on different characteristics and prices. Most of the information on your data will be gathered in the model, that you can then use in order to "price" something that is not included in your data sample. The whole methodology is called "hedonic pricing", and it's a term used by applied economists (see the link i gave you earlier). Regression by least-squares estimation is the statistical inference method that you use in order to build your hedonic pricing model.2011-01-12
1

I'm not sure, just an idea, but it seems that you are talking about interpolation of function F(price, factor1, ..., factorN) = 0 given a set of knots (basically knowing the price for a certain set of configurations define tuples (price, factor1, ..., factorN), which can be used to create an interpolation net).

And as for a concrete interpolation approach – you can try a bunch of those and see which fits in best.

It's just an idea, mathematicians, correct me if I'm wrong.

  • 0
    I think I understand. Practically, how would I go about this? Could you please give a simple example and tell me how to start, because I've never done this before. What about tools for this kind of thing?2011-01-12
0

If you want to hold a formula in your hands which relates all the parameters to your (potential) price this is the best tool I can think of:
http://ccsl.mae.cornell.edu/eureqa

You are not hindered by some form of the resulting equation, it is easy, fast, stable - and free!