Interpolate product attributes

R

R Problem Overview


I have a set of data from a set of discrete choice tasks which included two alternatives with three attributes (brand, price, performance). From this data, I have taken 1000 draws from the posterior distribution which I'll then use to calculate utility and eventually preference share for each individual and each draw.

Price and performance were tested at discrete levels (-.2, 0, .2) and (-.25, 0, .25) respectively. I need to be able to interpolate utility between attribute levels tested. Let's assume for now that a linear interpolation is a reasonable thing to do statistically. In other words, what is the most efficient way to interpolate the utility for price if I wanted to test a scenario with price @ 10% lower? I have not been able to think of a slick or efficient way to do the interpolation. I've resorted to an mapply() approach with the mdply function from plyr

Here's some data and my current approach:

library(plyr)
#draws from posterior, 2 respondents, 2 draws each
draw <- list(structure(c(-2.403, -2.295, 3.198, 1.378, 0.159, 1.531, 
1.567, -1.716, -4.244, 0.819, -1.121, -0.622, 1.519, 1.731, -1.779, 
2.84), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", 
"brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", 
"perf_3"))), structure(c(-4.794, -2.147, -1.912, 0.241, 0.084, 
0.31, 0.093, -0.249, 0.054, -0.042, 0.248, -0.737, -1.775, 1.803, 
0.73, -0.505), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", 
"brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", 
"perf_3")))) 

#define attributes for each brand: brand constant, price, performance
b1 <- c(1, .15, .25)
b2 <- c(2, .1, .2)

#Create data.frame out of attribute lists. Wil use mdply to go through each 
interpolateCombos <- data.frame(xout = c(b1,b2), 
                                atts = rep(c("Brand", "Price", "Performance"), 2),
                                i = rep(1:2, each = 3),
                                stringsAsFactors = FALSE)

#Find point along line. Tried approx(), but too slow

findInt <- function(x1,x2,y1,y2,reqx) {
  range <- x2 - x1
  diff <- reqx - x1
  out <- y1 + ((y2 - y1)/range) * diff
  return(out)
}


calcInterpolate <- function(xout, atts, i){
  if (atts == "Brand") {
    breaks <- 1:2
    cols <- 1:2
  } else if (atts == "Price"){
    breaks <- c(-.2, 0, .2)
    cols <- 3:5
  } else {
    breaks <- c(-.25, 0, .25)
    cols <- 6:8
  }

  utils <- draw[[i]][, cols]

  if (atts == "Brand" | xout %in% breaks){ #Brand can't be interpolated or if level matches a break
    out <- data.frame(out = utils[, match(xout, breaks)])
	} else{ #Must interpolate    
    mi <- min(which(breaks <= xout))
    ma <- max(which(breaks >= xout))
    out <- data.frame(out = findInt(breaks[mi], breaks[ma], utils[, mi], utils[,ma], xout))
	}
  out$draw <- 1:nrow(utils)
  return(out)
}
out <- mdply(interpolateCombos, calcInterpolate)

To provide context on what I'm trying to accomplish without interpolating attribute levels, here's how I'd do that. Note the brands are now defined in terms of their column reference. p1 & p2 refer to the product definition, u1 & u2 are the utility, and s1, s2 are the preference shares for that draw.

Any nudge in the right direction would be appreciated. My real case has 10 products with 8 attributes each. At 10k draws, my 8gb of ram are crapping out, but I can't get out of this rabbit hole I've dug myself.

p1 <- c(1,2,1)
p2 <- c(2,1,2)


FUN <- function(x, p1, p2) {
  bases <- c(0,2,5)
  
  u1 <- rowSums(x[, bases + p1])
  u2 <- rowSums(x[, bases + p2])
  sumExp <- exp(u1) + exp(u2)
  s1 <- exp(u1) / sumExp
  s2 <- exp(u2) / sumExp
  return(cbind(s1,s2))
}
lapply(draw, FUN, p1 = p1, p2 = p2)

[[1]]
                s1        s2
[1,] 0.00107646039 0.9989235
[2,] 0.00009391749 0.9999061

[[2]]
              s1        s2
[1,] 0.299432858 0.7005671
[2,] 0.004123175 0.9958768

R Solutions


Solution 1 - R

A somewhat unconventional way to get what you desire is to build a global ranking of all your products using your 10k draws.

Use each draw as a source of binary contests between the 10 products, and sum the results of these contests over all draws.

This will give you a final "leader-board" for your 10 products. From this you have relative utility across all consumers, or you can assign an absolute value based on the number of wins (and optionally, the "strength" of the alternative in each contest) for each product.

When you want to test a new product with a different attribute profile find its sparse(st) representation as a vector sum of (weighted) other sample products, and you can run the contest again with the win probabilities weighted by the contribution weights of the component attribute vectors.

The advantage of this is that simulating the contest is efficient, and the global ranking combined with representing new products as sparse vector sums of existing data allows much pondering and interpretation of the results, which is useful when you're considering strategies to beat the competition's product attributes.

To find a sparse (descriptive) representation of your new product (y) solve Ax = y where A is your matrix of existing products (rows as their attribute vectors), and y is a vector of weights of contributions from your existing products. You want to minimize the non-zero entries in y. Check out Donoho DL article on the fast homotopy method (like a power iteration) to solve l0-l1 minimization quickly to find sparse representations.

When you have this (or a weighted average of sparse representations) you can reason usefully about the performance of your new product based on the model set up by your existing preference draws.

The advantage of sparseness as a representation is it allows you to reason usefully, plus, the more features or product you have, the better, since the more likely the product is to be sparsely representable by them. So you can scale to big matrices and get really useful results with a quick algorithm.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChaseView Question on Stackoverflow
Solution 1 - RCris StringfellowView Answer on Stackoverflow