generate_maxdiff_conjoint_data#
- pymc_marketing.customer_choice.synthetic_data.generate_maxdiff_conjoint_data(n_respondents=150, n_items=12, item_attributes=None, utility_formula='~ 0 + C(brand) + price + quality', true_betas=None, n_tasks_per_resp=12, subset_size=4, random_attributes=None, sigma_respondent=0.4, items=None, random_seed=None)[source]#
Generate synthetic MaxDiff data with item-attribute utilities (part-worths).
Simulates a MaxDiff survey where each item has a fixed attribute profile and utilities are computed as \(U_i = X_i^\top \beta + \text{noise}\). Respondents optionally carry heterogeneous part-worths on a subset of features (analogous to the random-coefficients formulation in
MaxDiffMixedLogit).- Parameters:
- n_respondents
int, default 150 Number of respondents.
- n_items
int, default 12 Full item pool size. Ignored if
item_attributesis provided.- item_attributes
pd.DataFrame, optional One row per item, index = item name, columns = attributes. If None, attributes are auto-generated: a 3-level
brandcategorical and two continuous featuresprice ~ Uniform(0, 1)andquality ~ Normal(0, 1).- utility_formula
str, default"~ 0 + C(brand) + price + quality" Patsy formula used to expand
item_attributesinto a design matrix.- true_betas
dict[str,float], optional Ground-truth part-worths keyed by patsy-expanded feature name. Missing keys are drawn from
Normal(0, 1).- n_tasks_per_resp
int, default 12 Tasks per respondent.
- subset_size
int, default 4 Items shown per task.
- random_attributes
list[str], optional Feature names whose part-worths vary across respondents. Defaults to all features.
- sigma_respondent
float, default 0.4 Scale of per-respondent deviations on the random-feature subset.
- items
list[str], optional Item names. Defaults to
["item_0", ...]whenitem_attributesis None; otherwise taken fromitem_attributes.index.- random_seed
np.random.Generatororint, optional Random state.
- n_respondents
- Returns:
- task_df
pd.DataFrame Long-format data with columns
respondent_id,task_id,item_id,is_best,is_worst.- item_attributes
pd.DataFrame The attribute table, indexed by item name. Aligned with
items.- ground_truth
dict {"betas", "respondent_betas", "feature_names", "random_attributes", "sigma_respondent", "items", "X"}.betasis the population part-worth vector;respondent_betasholds the per-respondent part-worth matrix actually used to simulate picks.
- task_df
Notes
Real conjoint studies use balanced designs; this generator draws task subsets uniformly for simplicity, which is adequate for recovery tests.