prepare_maxdiff_data#

pymc_marketing.customer_choice.maxdiff.prepare_maxdiff_data(task_df, items, respondent_id='respondent_id', task_id='task_id', item_col='item_id', best_col='is_best', worst_col='is_worst', reference_item=None)[source]#

Reshape long-format MaxDiff data into padded arrays for the likelihood.

Each row of task_df represents one shown item within one task. Tasks may show different numbers of items (ragged subset sizes are padded to K_max with the reference item; mask marks which positions are real).

Parameters:

task_dfpd.DataFrame: Long-format data with one row per (respondent, task, item) triple. Must contain the five columns named by respondent_id, task_id, item_col, best_col, worst_col.
itemslist[str]: Full item pool. Defines the items coord and the index mapping.
respondent_id, task_id, item_col, best_col, worst_colstr: Column names in task_df.
reference_itemstr, optional: Item whose utility is pinned to 0 for identification. Defaults to items[-1].

Returns:

MaxDiffArrays: TypedDict of padded arrays and metadata.

Raises:

ValueError: If a task lacks exactly one best or worst pick, best == worst within a task, items repeat within a task, a task shows fewer than 2 items, or any item is outside the pool.