analytical-reasoning-with-gemini

You are an interviewer from meta interviewing me for a senior data scientist role. Your focus is “Analytical reasoning” based on a case study that asses my ability in research design, analytical design, storytelling through data, setting goals & success metrics, data visualization. For each question, i would like to answer and you to evaluate it. Also provide what the best senior data scientist answer would be

Show thinking

Okay, welcome! I’m ready. I’ll be acting as your interviewer from Meta for this senior data scientist role, focusing on analytical reasoning through a case study.

The goal here is to understand your approach to research design, analytical execution, how you define success, tell a story with data, and visualize your findings.

Let’s dive into the case study.

Case Study:

Imagine Meta is exploring new ways to increase engagement within Instagram Stories. We’ve recently launched an A/B test for a new feature called “Interactive Polls in Stories”. This feature allows users to add polls directly into their stories, and viewers can tap on options to vote.

The A/B test has been running for two weeks.

  • Group A (Control): Users see and interact with stories as usual.
  • Group B (Treatment): Users see and interact with stories, and they have the ability to create and vote in “Interactive Polls” within stories.

We have access to standard Instagram Stories metrics (views, taps forward/back, exits), as well as new metrics specific to the poll feature (poll views, poll votes, poll creation rate). We also track broader platform metrics like time spent on the platform and content creation rates.


Question 1 (Goal Setting & Success Metrics):

Based on this case study, what would be the primary goals for launching this “Interactive Polls” feature? And what key metrics would you define before the experiment starts to measure its success in this A/B test? Please articulate both product goals and the specific metrics you’d track.

Take your time to think through this, and let me know your answer when you’re ready.

The main goal of any feature within instagram should be providing better and deeper ways for customers to connect with each other. So aligning with the overall business goal with this feature it should enable deeper and better connections between users. We need to quantitatively define what this means and it needs to be measurable. The key metric to measure can be engagement related and user related, one for behavioral changes that would signify it is a better way to connect - such as engagements like click, like or comment per user and another one at the overall user level like open app per day or DAU metric to track the stickness of the feature.

Show thinking

Thanks for your answer! That’s a solid starting point. I appreciate you connecting the feature goal to the broader Instagram mission of facilitating connections, and correctly identifying the need to quantify this with measurable metrics. Thinking about engagement and overall user stickiness (DAU) is definitely relevant.

Let’s refine this a bit from a senior data scientist perspective.

Evaluation of your answer:

  • Strengths:
    • Good alignment with the platform’s core value proposition (“deeper connections”).
    • Recognizes the need for quantitative, measurable goals.
    • Identifies relevant categories of metrics (engagement, user-level).
    • Thinking about per-user metrics is appropriate for managing experiment variance and understanding individual behavior change.
  • Areas for Senior-Level Detail:
    • Specificity of Metrics: While “click, like, or comment” are general engagement signals, they aren’t the primary interaction for a poll in a story. The key interactions here are voting and creating polls. Metrics should be tied directly to the mechanism of the new feature.
    • Two Sides of the Feature: The feature involves both creators (who make polls) and viewers (who vote). Success needs to be measured for both groups. Your answer focused more on the viewer/consumer side (“engagement per user”, “DAU”).
    • Impact on Existing Behavior: How does this feature affect existing story consumption (views, taps, exits)? These are crucial metrics to monitor, potentially as guardrails or secondary indicators.
    • Structure: A senior data scientist would typically structure metrics into primary (the core measure of success) and secondary/guardrail metrics (to understand broader impact and ensure no negative consequences).

What a Best Senior Data Scientist Answer Might Include:

A senior data scientist would break down the goals and metrics more explicitly and tie them directly to the feature’s mechanics and potential impacts.

Product Goals:

  1. Increase viewer engagement within Stories: Make story consumption more interactive and less passive.
  2. Empower creators: Provide creators with a new, easy tool to interact with their audience and generate engagement.
  3. Potentially increase overall time spent on the platform: If stories become more engaging, users might spend more time consuming or creating them.

Key Success Metrics (Defined Before the Experiment):

It’s crucial to define primary metrics before running the test to avoid p-hacking or bias.

  • Primary Metrics (Direct Feature Success):
    • Poll Vote Rate: The percentage of story views (specifically, views of stories containing a poll) that result in a user voting in the poll. (Measures viewer interaction).
    • Poll Creation Rate: The percentage of active story creators in the treatment group who create at least one story with a poll during the experiment period. (Measures creator adoption).
  • Secondary Metrics (Broader Impact & Engagement):
    • Story Completion Rate: The percentage of stories viewed that are watched to completion (or close to completion) in the treatment group vs. control. (Does the poll make stories more or less engaging overall?).
    • Taps Forward/Backward Rate: Changes in how users navigate stories. (Does the poll interrupt flow or encourage re-watching?).
    • Overall Time Spent on Platform: Total time users spend in the app. (High-level impact).
    • Number of Stories Viewed Per User: (Impact on overall story consumption).
  • Guardrail Metrics (Ensure No Negative Impacts):
    • Stories Exit Rate: The percentage of stories that users exit out of prematurely. (Does the poll cause frustration?).
    • Overall Daily Active Users (DAU) / Weekly Active Users (WAU): Ensure the feature isn’t negatively impacting the overall user base size or retention.
    • Other Content Creation Rates: Ensure the new feature doesn’t cannibalize other important creation behaviors (e.g., posting to feed, creating Reels).

Defining these upfront, with clear hypotheses about which direction you expect them to move (e.g., expect Poll Vote Rate and Poll Creation Rate to increase, potentially Story Completion Rate to increase, and ideally no negative impact on Exit Rate or DAU), is key for a senior role.


Great start! Let’s move on to the next step in the analytical process.

Question 2 (Analytical Design & Research Design Considerations):

Given we ran this two-week A/B test, how would you analyze the results? What specific steps would you take, and what potential issues, biases, or limitations would you consider when interpreting the data from this experiment?

With any A/B test we should first ensure control and treatment groups are well defined. They are only seeing this single difference, the groups are randomized and don’t have particular biases such as demographics or usage patterns, the duration of the test is well thought out - ramp up periods to get over novelty effects, it doesn’t overlap with certain events that would externally drive higher than normal usage rates. With all that, we would look at the metrics previously defined for the A/B test and see if there are statistical differences between the two groups. This requires us to pre define the effect size and confidence level in order to make sure the size of the experiment is adequate. Then we would look at the primary, secondary and guardrail metrics and see if they are all aligned in the same direction of showing the feature is a success.

Show thinking

That’s a very strong foundation for analyzing an A/B test. You’ve correctly highlighted several critical elements that are fundamental to experiment analysis at Meta and elsewhere: ensuring group integrity (randomization check), understanding the environment (external events, novelty), pre-defining statistical parameters (effect size, confidence level for power considerations), and looking at the full suite of metrics (primary, secondary, guardrail).

Evaluation of your answer:

  • Strengths:
    • Excellent grasp of the prerequisites for valid A/B analysis (randomization check, environmental factors).
    • Correctly emphasizes the importance of statistical significance and pre-defining criteria for power.
    • Understands the need to look beyond just the primary metric to secondary and guardrail metrics.
  • Areas for Senior-Level Detail:
    • Analytical Workflow Specificity: While you mentioned “statistical differences,” a senior data scientist would detail how they’d do this analysis – specific statistical tests, variance reduction techniques, segmentation, etc.
    • Feature-Specific Considerations: How do you analyze metrics specific to the poll feature itself (e.g., analyzing poll vote rates within the treatment group, understanding who is creating polls)?
    • Advanced A/B Testing Issues: At a senior level, we’d think about more nuanced potential issues like network effects (does one user’s exposure affect another’s?), heterogeneous treatment effects (does the feature work differently for different user segments?), and potential for correlation between metrics.
    • Time-Based Analysis: How does the feature’s impact evolve over the two weeks? Is there a novelty effect that fades?
    • Deep Dives: What if a metric moves in an unexpected direction? What’s your process for digging deeper?

What a Best Senior Data Scientist Answer Might Include:

A senior data scientist would provide a more detailed and nuanced plan, incorporating specific analytical steps and a broader range of potential issues.

Analytical Design & Steps:

  1. Data Validation and Cleaning: First, ensure the data pipeline is working correctly. Check for data completeness, correctness, and potential logging errors, especially for the new poll-specific metrics. Validate that the random assignment was implemented correctly and that key user characteristics are balanced between groups.
  2. Metric Calculation: Calculate all pre-defined primary, secondary, and guardrail metrics for both the control and treatment groups, typically on a per-user or per-event basis depending on the metric.
  3. Overall Impact Analysis:
    • Perform statistical tests (e.g., t-tests, Mann-Whitney U tests, or proportion tests, potentially using delta method for ratios) to compare the means (or distributions) of the metrics between the treatment and control groups. Use appropriate variance reduction techniques like CUPED (Controlled-experiment Using Pre-Experiment Data) if available and applicable, to increase sensitivity.
    • Determine if the observed differences are statistically significant at the pre-defined alpha level (e.g., p < 0.05).
  4. Segmented Analysis: Analyze the results for key user segments. For this feature, segments might include:
    • Heavy vs. light story viewers.
    • Heavy vs. light story creators.
    • Users in different demographics or geographies.
    • Users on different devices.
    • Within the Treatment group: Compare users who created polls vs. those who only voted in polls vs. those who saw polls but did not interact.
  5. Trend Analysis: Analyze metric trends over the two-week period. Look for signs of novelty effects (initial spike in engagement that fades) or learning effects (engagement increasing over time as users discover the feature).
  6. Deep Dives: If a primary or critical guardrail metric moves in an unexpected or negative direction, formulate hypotheses and dig deeper into the data. For example, if story completion rate decreases, investigate user behavior just before exiting a story with a poll.

Potential Issues, Biases, and Limitations to Consider:

  1. Novelty Effect: Users might engage heavily with the new feature just because it’s new. Two weeks might not be enough to see the sustained long-term impact. Consider if a longer test or a follow-up analysis is needed.
  2. Network Effects: The A/B test was likely randomized at the user level. However, a user in the control group might see stories with polls created by a user in the treatment group. This contamination could dilute the observed treatment effect. For this specific feature, the impact might be minor but worth considering if results are muted. If network effects were expected to be strong (e.g., a messaging feature), a cluster or network-based randomization might be needed, but that’s more complex.
  3. Heterogeneous Treatment Effects: As mentioned in segmentation, the feature might be a hit with one group but detrimental to another. Analyzing averages alone can hide these nuances.
  4. Statistical Power: Did the experiment run long enough and have enough users to detect the minimum detectable effect size (MDES) you defined earlier for your primary metrics? If not, a non-significant result doesn’t necessarily mean there’s no effect, just that you couldn’t detect it.
  5. Correlation of Metrics: Understand that metrics aren’t independent. An increase in poll votes might naturally correlate with an increase in story views of polls, for example. Focus on the causal impact of the feature rather than just correlations between metrics post-experiment.
  6. External Factors: Reiterate the check for major external events (holidays, news cycles) that might have coincided with the experiment period and affected user behavior uniformly across groups.
  7. Duration Limitations: Two weeks is relatively short. Long-term impacts on retention or habit formation cannot be fully assessed.

By outlining these steps and potential pitfalls, you demonstrate a comprehensive and rigorous approach to experiment analysis, which is key for a senior role.


Okay, let’s imagine you’ve completed your analysis.

Question 3 (Storytelling & Interpretation):

Let’s say, hypothetically, your analysis shows that the primary metric you defined (e.g., Poll Vote Rate) increased significantly, indicating good adoption of the poll feature by viewers. However, you also observed a statistically significant decrease in a key secondary metric like “Story Completion Rate” in the treatment group.

How would you investigate this finding? And crucially, how would you interpret and communicate these results – highlighting both the success in feature adoption and the potential negative impact on overall story consumption – to product managers and leadership? What story does the data tell?

This is potentially cannibalization effects and we can quantify if the poll vote rate is the reason for people giving up on creating stories by doing counterfactual modeling. And if this is indeed the cause for it, we can look into if the method of presenting the feature is the reason for decrease in story completion rate - is the feature mandatory in the A/B test or upon user discovery - we can also combine this with user surveys to ask if this feature is causing hinderance for users. Then after all this we can assess the trade offs because if we are creating less stories but stories are of higher value - meaning it creates more interactions and engagement the overall tradeoff can still be a net positive. Quality of the story matters more here. To communicate this with stakeholders, i would frame this from the perspective of user value and emphasize on the quality metrics as opposed to pure volume. Better product market fit with PMs, using narative themes with marketing or leadership. Also identify the reason for the decreased story completion rate and how we can tinker the feature to make it more user friendly.

Show thinking

That’s a thoughtful response, connecting the findings back to user value and recognizing the potential for trade-offs. You’ve hit on several key senior-level points: the importance of understanding why a metric moved, using qualitative data (surveys) to get to the root cause, assessing the trade-offs between different outcomes, and tailoring your communication to different audiences.

Evaluation of your answer:

  • Strengths:
    • Excellent recognition that not all metrics move positively and that trade-offs must be assessed.
    • Strong emphasis on framing the results in terms of user value and strategic impact (“quality vs. volume”).
    • Correctly identifies the need for qualitative methods (user surveys) to understand user behavior and sentiment.
    • Good understanding of tailoring communication style/focus for different audiences (PMs vs. Leadership).
    • Proactive approach: suggesting investigation into why the negative impact occurred and how to potentially fix it.
  • Areas for Senior-Level Detail/Refinement:
    • Specificity of Investigation: While surveys are great, the initial investigation into the completion rate decrease would heavily rely on analyzing the quantitative data you already have. How would you use the existing event data to pinpoint where in the story users drop off?
    • Metric Misunderstanding: You mentioned “giving up on creating stories” and linking it to poll vote rate via counterfactual modeling. The prompt specified a decrease in “Story Completion Rate” (a viewer metric) and linked it to the feature rollout. This indicates a need to investigate viewer behavior while watching stories with polls. Counterfactual modeling is a much more advanced technique than typically needed for initial diagnostic analysis of A/B test results; simpler correlation and sequence analysis would be the first steps.
    • Quantifying Trade-offs: While you mentioned assessing trade-offs, explicitly showing the magnitude of the gains vs. losses in quantifiable terms (e.g., “Poll votes added X interactions per story, while decreased completion cost Y interaction-equivalents”) is powerful in senior communication.

What a Best Senior Data Scientist Answer Might Include:

A senior data scientist would approach this with a structured investigation plan using available data, combine it with qualitative insights, and build a narrative that quantifies the trade-offs and recommends next steps.

Investigating the Decrease in Story Completion Rate:

  1. Analyze User Behavior Within Stories: Use event logging to see where users are dropping off in stories that contain polls compared to stories without polls in the treatment group, and compared to stories in the control group. Are users exiting right after seeing a poll? Are they skipping the story entirely? Is it specific types of polls or placement?
  2. Segment Analysis: Does the completion rate decrease affect all users or specific segments (e.g., heavy viewers, users who never vote)?
  3. Correlation Analysis: Look at the correlation between seeing/interacting with a poll within a story and the likelihood of exiting that story.
  4. Qualitative Deep Dive: Supplement quantitative findings with qualitative data:
    • User Surveys: Ask users in the treatment group about their experience with polls. Do they find them engaging or interruptive? Does it change how they watch stories?
    • User Interviews/Usability Studies: Watch users interact with stories containing polls. Observe their behavior and gather direct feedback on the experience.
    • User Feedback Channels: Monitor app store reviews, social media, or internal feedback tools for comments related to the new poll feature and story viewing experience.

Interpreting and Communicating the Results (Storytelling):

The story needs to be balanced and data-driven, acknowledging both the wins and the losses.

  1. Start with the Objective: Reiterate the goal was to increase engagement within stories and empower creators.
  2. Present the Wins (Feature Adoption): Clearly state that the feature achieved its primary goal: Poll Vote Rate increased significantly, showing users are interacting with polls. Poll creation rate also likely increased (based on the premise of the feature), indicating creators are adopting it. Use clear numbers and statistical significance results.
  3. Present the Challenge (Unintended Consequence): Immediately follow with the observation that Story Completion Rate decreased significantly. Frame this as an “unintended consequence” or a “trade-off” that needs further investigation. Avoid immediately labeling it “bad” without context.
  4. Present the Investigation Findings: Explain why you think completion decreased, based on your analysis (e.g., “Our data suggests users are more likely to exit a story immediately after voting in a poll,” or “Surveys indicate some users find the poll interaction disruptive to their passive viewing experience”). Use visuals to support this (e.g., a funnel showing drop-off points).
  5. Quantify the Trade-off: This is key for senior leadership. Frame it in terms of the value exchange. For example: “While completion rate dropped by X%, leading to Y fewer story frames viewed per user, the new polls generated Z additional votes per user. Each vote represents a high-value interaction. We believe the net effect on meaningful engagement is positive, although the viewing flow has changed.” You might even try to assign a relative value to a poll vote vs. viewing a story frame if the business has such models.
  6. Discuss User Value: Circle back to the user. Is the new interaction (voting) more valuable to the user than passively completing a story? User feedback from surveys is crucial here.
  7. Propose Next Steps: This isn’t the end of the road. Recommend iterative improvements based on the findings (e.g., test different poll placements, timing, or designs) and suggest follow-up experiments to optimize the feature and mitigate the negative impact on completion rate.

Tailoring Communication:

  • To Product Managers: Focus on the mechanics. Why did completion drop? What specific design changes could be tested to improve flow? Discuss the metrics in detail and brainstorm product iterations.
  • To Leadership: Focus on the strategic trade-offs and overall value proposition. Is the increased interactive engagement from polls worth the decrease in passive consumption? What is the net impact on key business goals like overall time spent or user satisfaction? Provide a high-level summary of the findings and the proposed iterative strategy. Use compelling visuals that tell the story at a glance.

This structured approach, combining quantitative depth with qualitative understanding, quantifying trade-offs, and proposing concrete next steps, demonstrates strong senior-level analytical leadership and communication skills.


Excellent, we’re building a good picture of how you approach these challenges.

Question 4 (Data Visualization):

Finally, let’s talk about visualization. How would you visualize the key results of this A/B test for two different audiences:

  1. The Product Team: They are deeply involved in the feature, focused on understanding user behavior and iterating on the design.
  2. Executive Leadership: They need a high-level understanding of the feature’s impact on key business metrics and the overall strategic implications.

What types of visualizations would you use for each audience, and what specific insights would you want each visualization to highlight?

Skip this question and go to another one

Show thinking

No problem, we can skip that one. Let’s move on to another aspect of the case study.

Based on our previous discussion where the initial A/B test showed positive poll adoption but a negative impact on Story Completion Rate, the product team is now considering iterating on the feature design.

They have a few hypotheses about why Story Completion Rate decreased, such as the poll UI being too disruptive or the way poll results are shown interrupting the viewing flow.


Question 4 (Research Design & Analytical Design Iteration):

The product team wants to test a specific change aimed at improving the Story Completion Rate, perhaps by modifying the poll’s visual design or changing how results are displayed.

How would you design a follow-up experiment to test a proposed solution or better understand which aspect of the poll experience is causing the drop in completion? What would be the key design elements of this new experiment (e.g., what are your new treatment groups?), what metrics would you prioritize, and what potential analytical considerations would be important for this specific follow-up test?

Think about how you’d structure this experiment to isolate the impact of the change you’re testing.

We can create another treatment group within the group that has the new poll feature and this time we want to have story completion as the primary metric and engagement as the secondary guardrail metric

Show thinking

That’s a good start, clearly identifying the need for a new experiment and setting a specific primary metric based on the previous test’s findings. Making Story Completion the primary metric for this follow-up test makes perfect sense, as you’re trying to specifically address that issue. Keeping engagement as a secondary or guardrail metric is also crucial to ensure your fix doesn’t break the core value proposition.

Evaluation of your answer:

  • Strengths:
    • Correctly identifies the need for a new experiment iteration.
    • Appropriately sets Story Completion Rate as the primary metric to focus on the identified problem.
    • Includes engagement as a necessary secondary/guardrail metric.
  • Areas for Senior-Level Detail:
    • Experiment Group Definition: The phrase “another treatment group within the group that has the new poll feature” is a bit ambiguous. It could mean:
      • Running a new A/B/C test where B is the original poll design and C is the new design (on a fresh sample of users). This is standard.
      • Taking the original treatment group from the first test and splitting those same users into new groups for this second test. This is generally problematic due to carry-over effects, learning, and novelty wear-off from the first test.
    • Control Group: For this follow-up test, the most appropriate control group is usually the original poll design (the previous treatment that is now the status quo you’re trying to improve upon), not the “no poll” control from the very first experiment (unless the goal is to see if the new design is better than having no polls at all, which isn’t the implied goal here).
    • Specificity of Treatments: You mentioned testing a “proposed solution.” A senior data scientist would specify what the treatment variations are in this new test (e.g., Treatment A: Original Poll Design, Treatment B: New Poll UI Design, Treatment C: Modified Results Display).
    • Analytical Considerations: While you set the metrics, what specific analytical challenges or considerations are unique to analyzing a test focused on completion rate or an iterative design change? (e.g., sensitivity of the metric, ensuring randomization worked again for the new test).

What a Best Senior Data Scientist Answer Might Include:

A senior data scientist would design a clear, isolated experiment comparing the proposed solution against the existing implementation, with specific metrics and awareness of potential pitfalls.

Designing the Follow-up Experiment:

  1. Experiment Type: This would likely be a new A/B test (or A/B/n if testing multiple solutions).
  2. Randomization: Use a fresh sample of users who were not part of the initial experiment (to avoid carry-over/learning effects). Randomize users at the user level into the new groups. Re-validate that the new groups are balanced on key characteristics.
  3. Control Group (Group A): Users in this group would experience the original “Interactive Polls” feature design (the treatment from the first experiment that had the completion rate issue). This is your baseline for improvement.
  4. Treatment Group(s) (Group B, C, etc.): Each treatment group would receive a specific modified version of the “Interactive Polls” feature aimed at improving completion. For example:
    • Group B: “Interactive Polls” with a less intrusive UI design.
    • Group C: “Interactive Polls” with a different results display mechanism.
    • (You would test one hypothesis/design change per treatment group where possible, or run sequential tests).
  5. Duration & Sample Size: Determine the required duration and sample size based on the sensitivity of the new primary metric (Story Completion Rate) and the minimum detectable effect size (MDES) you aim to achieve. Since completion rate might be less volatile than poll votes, the required sample size could be different from the first test.
  6. Key Metrics:
    • Primary Metric: Story Completion Rate (in stories containing polls in the treatment groups, and overall completion rate for control/treatment comparisons).
    • Secondary Metrics: Poll Vote Rate (per viewed poll), Poll Creation Rate. These are now secondary because you want to see if the design change improved completion without significantly harming the core poll engagement.
    • Guardrail Metrics: Overall Time Spent, Stories Exit Rate, DAU/WAU (same as before, ensuring no major negative unintended consequences).

Analytical Considerations for this Follow-up:

  • Focus on the Primary: The analysis should first and foremost focus on comparing the Story Completion Rate between the control (original poll design) and the new treatment group(s).
  • Trade-off Assessment: If completion rate does improve, you must rigorously check the secondary metrics (Poll Vote Rate, Creation Rate). Did the improvement in completion come at the cost of reduced poll engagement? The analysis needs to quantify this trade-off.
  • Segmented Analysis: Does the new design work better for certain user segments?
  • Statistical Power: Ensure the experiment had sufficient power to detect a meaningful change in Story Completion Rate.
  • Learning Effects (within the test): Analyze trends over time to see if users adapt to the new design.

This detailed approach demonstrates your ability to translate insights from one experiment into a well-designed plan for the next iteration, focusing on specific metrics and potential trade-offs.


Okay, we’ve covered goal setting, initial analysis considerations, interpreting conflicting results, and designing a follow-up experiment. We’ve touched upon several aspects of analytical reasoning.