Module on Adaptive Experimentation

Published

November 15, 2024

Human-Centered Data Science gives us new ways to analyze and apply data. We can combine ideas from traditional experiments with machine learning methods to do an adaptive experiment.

In an adaptive experiment, we adjust our design — typically the allocation of participants to different conditions — based on what we learn from each interaction with participants at any given point. The goal is usually to balance exploration — testing various conditions to gather more information about their effectiveness with exploitation — using the algorithm’s knowledge so far to allocate participants to the currently better-performing conditions.

One example of balancing exploration and exploitation is finding good restaurants in a new city. At first, you explore different options, trying various places to gather information about what’s good. Once you’ve identified a standout spot, you start visiting it more often, but still occasionally try new places in case there’s an even better one you haven’t discovered yet.

One strategy to achieve balancing exploration and exploitation in practice is is response-adaptive randomization, where the probabilities of assigning participants to each condition are adjusted based on accumulated evidence. As more data from participants is gathered over time, the allocation approach gradually shifts away from an even split (e.g., 50/50) toward favoring conditions that perform better empirically based on the available evidence.

Question:

Could you suggest another example illustrating the idea of an adaptive experiment?

Thanks for Response! Another example to think of is to test the effectiveness of two type of vaccines.

Question:

Select a true statement about randomization.

Response-adaptive randomization always starts from unequal probabilities of assignment for each condition.Traditional (uniform-randomly assigned) experiments express our preference toward exploration compared to exploitation.The goal of response-adaptive randomization is to achieve equal size of groups for each condition.Randomization always assumes equal probabilities of assignment to conditions.

Randomization always assumes equal probabilities of assignment to conditions:

Explanation of Options:

Option 1: False. Response-adaptive randomization typically begins with equal probabilities and adjusts them based on participant responses, rather than always starting with unequal probabilities.
Option 2: True. Traditional uniform randomization assigns participants equally across all conditions, reflecting a preference for exploration by ensuring that all conditions are equally tested without bias toward any particular option.
Option 3: False. The goal of response-adaptive randomization is to allocate more participants to better-performing conditions based on interim results, not necessarily to achieve equal group sizes.
Option 4: False. While traditional randomization assumes equal probabilities of assignment, response-adaptive randomization may use unequal probabilities based on ongoing results, so randomization does not always assume equal probabilities.

Let’s recap some definitions

Condition: A specific treatment, arm, or variation assigned to a participant during the experiment.

Reward: A measurable outcome or benefit that results from a participant’s interaction with one of the conditions. In the simplest case, the reward can be binary: a success or a failure, e.g., answering a test question correctly or not.

Response-adaptive randomization: A method used in experiments where the probability of assigning participants to different conditions changes over time based on observed results (e.g. reward).

Adaptive experiment is a type of randomized trial (for example, in areas like clinical trials, and educational research) in which we adjust our design — typically the allocation to different conditions — based on what we learn from interactions with participants. In this module, we will focus on adaptive experiments using statistical and machine learning methods to perform response-adaptive randomization.

Question:

Developing your example of an adaptive experiment from the task above, suggest conditions for your example. Feel free to discuss alternative ideas.

Thank you for your response! Example conditions could include two types of treatments such as:

Vaccine A with a higher initial dosage.
Vaccine B with a standard dosage and booster after one month.

These conditions would allow for comparing the effectiveness and adaptively assigning participants based on observed outcomes.

Question:

Developing your example of an adaptive experiment from the task above, suggest a reward for your example. Feel free to discuss alternative ideas.

Thank you for your response! An example reward could be whether an individual participant survives after receiving the treatment (mortality). Alternatively, it could measure improvements in the participant’s health, such as reduced symptom severity or time to recovery.

Question:

Which of the following statements about adaptive experiments is correct?

Adaptive experiments are not usually randomized.Adaptive experiments are not usually uniformly randomized.Adaptive experiments ensure an even split of participants across all conditions throughout the experiment.The primary goal of adaptive experiments is to avoid randomization entirely.

Adaptive experiments are not usually uniformly randomized:

Explanation of Options:

Option 1: False. Adaptive experiments often involve randomization, but the randomization is adjusted dynamically based on prior responses.
Option 2: True. Adaptive experiments typically deviate from uniform randomization, dynamically adjusting the allocation probabilities based on observed data to optimize outcomes.
Option 3: False. Adaptive experiments do not aim to ensure an even split of participants across conditions; instead, they aim to allocate participants based on performance or other criteria.
Option 4: False. The primary goal of adaptive experiments is to improve efficiency and outcomes, not to eliminate randomization entirely.

Question:

Which of the following statements about adaptive experiments is correct?

Response-adaptive randomization always requires observing rewards immediately.Rewards in adaptive experiments are always binary, representing success or failure.One condition in adaptive experiments should always be control.To update the probabilities of assigning the participants to different conditions, we have to observe rewards.

To update the probabilities of assigning the participants to different conditions, we have to observe rewards:

Explanation of Options:

Option 1: False. While observing rewards is necessary for updating probabilities, the observation does not always need to be immediate; delayed rewards can also be incorporated into adaptive randomization.
Option 2: False. Rewards in adaptive experiments can take various forms, such as continuous values, ordinal scales, or other metrics, not just binary success/failure.
Option 3: False. Adaptive experiments do not necessarily require a control condition; they can involve multiple experimental conditions with no distinct control group.
Option 4: True. Observing rewards is essential for updating the allocation probabilities in adaptive experiments, as the reward feedback informs how conditions are adjusted dynamically.

EXAMPLE

Full Example of an Adaptive Experiment

Now that we have some understanding of adaptive experiments, let’s examine a real-world-inspired example. This example will focus on analyzing results and comparing outcomes between a traditional uniformly-randomly assigned Randomized Controlled Trial (RCT) and an adaptive experiment.

Learning designer Yi wants to encourage student self-reflection after course activities. Her goal is to help students think critically about their responses and learning processes.

Experiment Design

Learning designer Yi selects a specific decision point in the course: a particular activity where students are encouraged to reflect on their responses. She defines two conditions for her experiment:

Condition 1: No self-explanation prompt is shown to the students.
Condition 2: A self-explanation prompt is provided, asking: Can you explain why you chose your answer?

Yi defines the correctness of the student’s answer to a multiple-choice question (related to the activity) as the reward for the experiment. This allows her to measure the impact of reflection prompts on learning outcomes.

Comparing Traditional and Adaptive Experiments

Imagine that we ran both a traditional uniformly-randomly assigned RCT and an adaptive experiment in parallel, using the same design, to compare their results.

Results of the Traditional Experiment

In the traditional experiment, there is a statistically significant difference between the mean rewards (where the mean can be interpreted as the share of positive rewards for each condition). The results are as follows:

Condition 2 (self-explanation prompt): M2=0.608 (SEM=0.032)
Condition 1 (no prompt): M1=0.512 (SEM=0.032)

Results of the Adaptive Experiment

In the adaptive experiment, the computed estimates were:

Condition 2 (self-explanation prompt): M2=0.599
Condition 1 (no prompt): M1=0.539

These results also favor the self-explanation prompt (Condition 2). However, as discussed earlier, the primary advantage of adaptive experiments lies in the allocation of participants to different conditions.

Allocation Differences

In the traditional experiment, as expected, participants were assigned almost equally to the two conditions. By contrast, in the adaptive experiment, the assignment proportions differed significantly, reflecting the algorithm’s ability to prioritize better-performing conditions (in this case, Condition 2).

Only 176 students (40%) were assigned to Condition 1 (the worse-performing condition). This means that the algorithm allocated 262 students (60%) to Condition 2, which was identified as the better option according to our analysis. This corresponds to more than a 10% reallocation toward the better-performing condition. It illustrates how adaptive A/B comparisons can facilitate the rapid use of gathered evidence, which is especially relevant in real classroom settings.

Question:

What is the key benefit of adaptive experimentation compared to traditional Randomized Controlled Trials, as described in the example?

Adaptive experiments assign participants equally to each condition.Adaptive experiments aim for more participants to be assigned to the condition with better outcomes as the experiment progresses.Adaptive experiments are faster to conduct with fewer participants.Adaptive experiments provide statistically significant results without needing a control group.

Adaptive experiments aim for more participants to be assigned to the condition with better outcomes as the experiment progresses:

Explanation of Options:

Option 1: Incorrect. Unlike traditional RCTs, adaptive experiments dynamically adjust participant allocation based on observed outcomes, rather than assigning participants equally.
Option 2: Correct. The key benefit of adaptive experimentation is the ability to allocate more participants to conditions with better outcomes, improving efficiency and relevance.
Option 3: Incorrect. Adaptive experiments may not always be faster or require fewer participants; their primary benefit lies in the adaptive allocation process.
Option 4: Incorrect. Adaptive experiments still require a control group to compare outcomes and do not bypass statistical requirements for significance.

DID I GET THIS?

Question:

How would you explain an idea of Adaptive Experiment to a child in 150 words or less?

Thanks for Response!

Imagine you’re trying to figure out the best flavor of ice cream for your birthday party, but you don’t know which one everyone likes the most. Instead of giving everyone the same flavor, you start by letting a few people try chocolate and a few people try vanilla. After they taste it, you see which one people like better. If more people like chocolate, you give more chocolate to the next group of friends to try, but you still let some people try vanilla, just in case their opinion changes things.

Over time, you keep adjusting how much chocolate or vanilla you give out based on what people like. By the end, you’ll know which flavor is the best for your party. That’s how an Adaptive Experiment works—it learns as it goes and tries to do better each step!

Question:

If Condition 1 has reward statistics ( S = 2, F = 10 ), and Condition 2 has ( S = 10, F = 2 ), on the next step we:

Are more likely to select Condition 2Will always select Condition 1Are equally likely to select any conditionAre more likely to select Condition 1Will always select Condition 2

Are more likely to select Condition 2:

Explanation of Options:

Option 1: Correct. Condition 2 has a significantly higher success-to-failure ratio (( S = 10, F = 2 )) compared to Condition 1 (( S = 2, F = 10 )), making it more likely to be selected in the next step.
Option 2: Incorrect. While Condition 1 is less successful, adaptive experiments use probabilities for selection, and no condition is guaranteed to be selected on every step.
Option 3: Incorrect. Conditions are not equally likely to be selected, as the selection probabilities depend on observed reward statistics.
Option 4: Incorrect. Condition 1 has a lower reward ratio compared to Condition 2, so it is less likely to be selected.
Option 5: Incorrect. Condition 2 is more likely to be selected, but it is not guaranteed to be chosen every time.

The following table displays the results of the experiments:

Experiment	Condition	Successes	Failures
Experiment A	Condition 1	4	6
	Condition 2	6	4
Experiment B	Condition 1	8	12
	Condition 2	12	8

Question:

Compare two experiments based on the table above. In both Experiment A and Experiment B, the success rate for Condition 1 is the same (40%). In both Experiment A and Experiment B, the success rate for Condition 2 is also the same (60%).

We know more about the effectiveness of every condition in Experiment A than in Experiment BWe know more about the effectiveness of every condition in Experiment B than in Experiment AWe know the same amount of information about the effectiveness of every condition in Experiments A and B

We know more about the effectiveness of every condition in Experiment B than in Experiment A:

Explanation of Options:

Option 1: Incorrect. While Experiment A has fewer participants, the proportional success rates do not provide higher certainty compared to Experiment B.
Option 2: Correct. Experiment B has a larger sample size, reducing variability and providing more information about the effectiveness of each condition.
Option 3: Incorrect. Although the success rates are identical, the amount of information differs due to the sample sizes.

Question:

We are more likely to select Condition 1 in Experiment A than in Experiment BWe are more likely to select Condition 1 in Experiment B than in Experiment AWe are equally likely to select Condition 1 in Experiment A and in Experiment B

We are more likely to select Condition 1 in Experiment A than in Experiment B:

Explanation of Options:

Option 1: Correct. Experiment A has a smaller sample size, resulting in a posterior distribution with greater variability. This increases the likelihood of selecting Condition 1 compared to Experiment B, where the larger sample size reduces posterior variability and skews the selection toward the true success rates.
Option 2: Incorrect. In Experiment B, the posterior is more concentrated due to the larger sample size, which reduces the likelihood of selecting Condition 1.
Option 3: Incorrect. Despite identical success rates, the difference in posterior variability makes the likelihood of selecting Condition 1 different between the experiments.

Module on Adaptive Experimentation

Let’s recap some definitions

WALKTHROUGH

How response-adaptive randomization works step-by-step:

LEARN MORE

Technical Note: Probability of Assignment (or Selection)

EXAMPLE

Full Example of an Adaptive Experiment

Experiment Design

Comparing Traditional and Adaptive Experiments

Results of the Traditional Experiment

Results of the Adaptive Experiment

Allocation Differences

DID I GET THIS?