MaxDiffMixedLogit#
- class pymc_marketing.customer_choice.maxdiff.MaxDiffMixedLogit(task_df, items, respondent_id='respondent_id', task_id='task_id', item_col='item_id', best_col='is_best', worst_col='is_worst', random_intercepts=True, reference_item=None, model_config=None, sampler_config=None, non_centered=True, item_attributes=None, utility_formula=None, random_attributes=None, full_covariance=False, lkj_eta=2.0)[source]#
Hierarchical MaxDiff (Best-Worst Scaling) model.
Estimates item-level utilities from best-worst choice data with optional per-respondent random intercepts. The likelihood is the Louviere sequential best-worst model:
\[\begin{split}P(\\text{best}_t = b \\mid \\text{subset}_t) &= \\operatorname{softmax}(U)_b \\\\ P(\\text{worst}_t = w \\mid \\text{subset}_t, b) &= \\operatorname{softmax}(-U_{\\setminus b})_w\end{split}\]implemented as two
pm.Categoricalobserved distributions so thatpm.sample_posterior_predictiveyields best/worst draws directly.- Parameters:
- task_df
pd.DataFrame Long-format MaxDiff data; see
prepare_maxdiff_data().- items
list[str] Full item pool. Defines the
itemscoord.- respondent_id
str, default “respondent_id” Column in
task_dfidentifying respondents.- task_id
str, default “task_id” Column identifying tasks (unique within respondent).
- item_col
str, default “item_id” Column naming the shown item (must be in
items).- best_col
str, default “is_best” 0/1 column flagging the best pick within each task.
- worst_col
str, default “is_worst” 0/1 column flagging the worst pick.
- random_interceptsbool, default
True When True, each respondent draws item-level deviations from the population item utilities (HB-MaxDiff). When False, only population utilities are estimated.
- reference_item
str, optional Item pinned to utility 0 for identification. Defaults to
items[-1].- model_config
dict, optional Priors for
beta_item_(population utilities) andsigma_item(per-item heterogeneity scale).- sampler_config
dict, optional Arguments passed to
pm.sample.- non_centeredbool, default
True Non-centered parameterisation for the respondent-level deviations.
- item_attributes
pd.DataFrame, optional One row per item, with the item name as the index and one column per attribute. When provided together with
utility_formula, switches the model into part-worths mode: utilities become \(U_i = X_i^\\top \\beta_{\\mathrm{feat}}\) where \(X\) is the patsy-expanded design matrix. Extrapolates naturally to new items via their attributes. Must cover every item initems.- utility_formula
str, optional Patsy formula describing the attribute contribution to utility, e.g.
"~ 0 + C(brand) + price + quality". Required iffitem_attributesis given. Use a leading0 +(no intercept) so the model is identified without a reference item.- random_attributes
list[str], optional Names of patsy-expanded feature columns that should vary across respondents (respondent part-worths). Remaining features are treated as population-level fixed effects. Only meaningful in part-worths mode; ignored otherwise. Defaults to an empty list (pure fixed part-worths).
Note
Other customer-choice models in this package use Wilkinson pipe notation
"~ covariate | random_covariate"to declare random coefficients. MaxDiff deliberately diverges: there is no per-alternative equation structure here (the same attributes describe every item), so the pipe formula is ambiguous. An explicit list is cleaner and less error-prone.
- task_df
Notes
Input format example:
respondent_id task_id item_id is_best is_worst r1 1 apple 0 0 r1 1 banana 1 0 r1 1 cherry 0 1 r1 1 date 0 0 r1 2 apple 0 1 ...
Each
(respondent_id, task_id)group must contain exactly one row withis_best == 1and one withis_worst == 1, and the two must differ. Each task must show at least two items. Subset sizes may vary across tasks; they are padded toK_maxinternally.In the default (item-intercept) mode only item-utility contrasts against the reference item are identified; absolute levels are not. In part-worths mode
reference_item/random_interceptsare ignored — identification comes from the no-intercept formula (~ 0 + ...) and respondent heterogeneity is controlled byrandom_attributes.Posterior predictive limitations
The Louviere best-worst likelihood is sequential: worst is drawn from the remaining items after the best has been removed. In the PyMC graph this is implemented by masking the best position out of the worst-pick softmax using
best_posas apm.Datanode.sample_posterior_predictive()therefore produces a partially conditioned joint:best_pickis sampled correctly fromsoftmax(U).worst_pickis sampled fromsoftmax(-U \\ {observed_best}), i.e. it is still conditioned on the observed best position, not on the freshly sampledbest_pick.
This makes the joint
(best_pick, worst_pick)draws incoherent for generative use — the two picks may designate the same position.sample_posterior_predictive()remains valid for in-sample posterior predictive checks: verifying that the model’s worst-pick distribution is consistent with the data, given that the best pick was what was actually recorded.For any counterfactual or out-of-sample simulation use
predict_choices()(orapply_intervention()), which samples the joint(best, worst)generatively — best first, then worst conditioned on the sampled best — producing a coherent joint draw.Methods
MaxDiffMixedLogit.__init__(task_df, items[, ...])Initialize model configuration and sampler configuration for the model.
MaxDiffMixedLogit.apply_intervention(new_task_df)Simulate choices under a counterfactual task design.
Rehydrate init kwargs from serialised idata attrs.
Rebuild the PyMC model from a loaded InferenceData.
MaxDiffMixedLogit.build_model(**kwargs)Build the PyMC model using the cached
task_df.Serialise init kwargs so the model can be reloaded from idata.
MaxDiffMixedLogit.fit([task_df, ...])Fit the model via NUTS and attach the result to
self.idata.MaxDiffMixedLogit.graphviz(**kwargs)Get the graphviz representation of the model.
Create the model configuration and sampler configuration from the InferenceData to keyword arguments.
MaxDiffMixedLogit.load(fname[, check])Create a ModelBuilder instance from a file.
MaxDiffMixedLogit.load_from_idata(idata[, check])Create a ModelBuilder instance from an InferenceData object.
MaxDiffMixedLogit.make_model(arrays[, observed])Build the MaxDiff PyMC model.
MaxDiffMixedLogit.predict_choices(task_df[, ...])Fully generative (best, worst) simulation under a new task design.
Run
prepare_maxdiff_data()and cache its outputs on the model.MaxDiffMixedLogit.sample([...])Run prior predictive, fit, and posterior predictive in sequence.
Sample from the posterior predictive distribution.
Sample from the prior predictive distribution.
MaxDiffMixedLogit.save(fname, **kwargs)Save the model's inference data to a file.
Compute posterior share-of-preference after introducing new items.
MaxDiffMixedLogit.set_idata_attrs([idata])Set attributes on an InferenceData object.
MaxDiffMixedLogit.table(**model_table_kwargs)Get the summary table of the model.
MaxDiffMixedLogit.transform_attributes(new_attrs)Apply the fitted patsy formula to a new attribute frame.
Attributes
default_model_configDefault priors — returns only the priors used by the active mode.
default_sampler_configDefault sampler configuration.
fit_resultGet the posterior fit_result.
idGenerate a unique hash value for the model.
output_varPrimary observed variable name.
posteriorposterior_predictivepredictionspriorprior_predictiveversionidatasampler_configmodel_config