prepare_maxdiff_data#

pymc_marketing.customer_choice.maxdiff.prepare_maxdiff_data(task_df, items, respondent_id='respondent_id', task_id='task_id', item_col='item_id', best_col='is_best', worst_col='is_worst', reference_item=None)[source]#

Reshape long-format MaxDiff data into padded arrays for the likelihood.

Each row of task_df represents one shown item within one task. Tasks may show different numbers of items (ragged subset sizes are padded to K_max with the reference item; mask marks which positions are real).

Parameters:
task_dfpd.DataFrame

Long-format data with one row per (respondent, task, item) triple. Must contain the five columns named by respondent_id, task_id, item_col, best_col, worst_col.

itemslist[str]

Full item pool. Defines the items coord and the index mapping.

respondent_id, task_id, item_col, best_col, worst_colstr

Column names in task_df.

reference_itemstr, optional

Item whose utility is pinned to 0 for identification. Defaults to items[-1].

Returns:
MaxDiffArrays

TypedDict of padded arrays and metadata.

Raises:
ValueError

If a task lacks exactly one best or worst pick, best == worst within a task, items repeat within a task, a task shows fewer than 2 items, or any item is outside the pool.