ChatGPT: A $150,000 PM for $20 a month

I have a new product manager that outperforms most mid-level PMs I’ve worked with. It’s faster, more thorough, and has ideas veteran PMs miss. It’s also an AI. As someone who has recruited and managed PMs for over a decade, this is what keeps me up at night.

To see how far AI has really come, I started feeding ChatGPT the same product design and strategy questions I use to interview human candidates to see how it would do. The answer: great. For most of the tasks, it has easily out-performed entry level PMs and PMs with 5-7 years of experience. It has come up with solutions that even veteran PMs haven’t. All for the low, low price of $20/month. And, of course, it does it faster.

The humble volume buttons

Here’s one example: In the latest hardware refresh, Google moved the volume buttons on the remote for their TV streamer from the side of the remote to the face.

New (left) and old Google streaming remote

ChatGPT came up with the expected answers: the buttons on the side have become very familiar to users because that’s the way cell phone buttons work. It also lets the remote be smaller.

Putting the buttons on the face is more equivalent to traditional remote controls in terms of discoverability. That’s where they’ve always been. But it makes the remote substantially bigger. (See picture above.)

That’s where most PMs would stop. ChatGPT went into the details of tooling and manufacturing costs.

The absurdity test

I also did something I frequently do with PMs: suggest absurd ideas to see if 1) they understand that they are absurd 2) they are willing to push back.

I suggested doing a split test, with 5,000 units with the volume buttons on the side and 5,000 units with the buttons on the face.

Many junior PMs say “Sure, sounds like a good experiment.” They are trained to be data-driven.

Although that works well in a software environment, that’s a really bad idea for hardware. Doing a split run is prohibitively expensive due to tooling costs. You’d also have to come up with different packaging and marketing materials.

ChatGPT came up with the idea I was looking for: 3D print a few samples and bring in people to test them.

Absent that, ChatGPT recommended putting the volume controls on the side. So did Gemini. (If I meet the team who designed the new remote, I will definitely ask about the reason for the swap – and the swap of the home and assistant buttons.)

What does it mean for entry-level PMs?

I’m afraid the answer isn’t great. I can get $150k of productivity for $20/month. That’s not a tough call.

That begs the question: if there isn’t a pipeline for entry-level and mid-level PMs, where do senior level PMs come from? The best answer for now is that PMs need to expand their breadth to be able to handle more complexity: integrate design, development, business and systems level thinking into their repertoire.

As Scott Belsky says, taste becomes more important than ever.

So does the ability to see what the AI doesn’t: power dynamics, company incentives, unquantifiable friction — and what’s not on the roadmap, but should be.

A snippet of the ChatGPT response is below.

I asked ChatGPT to write a PRD…

When I talk to people about AI, they’re convinced that AI will take everyone’s job but theirs.

So I asked ChatGPT to write a PRD based on these broad requirements on streaming.

This is how it went, with my analysis in italics.

>: create a prd

Here’s a condensed version for a Product Requirements Document (PRD) based on the article’s suggestions:

Title: Streaming Service User Experience Enhancements

1. Objective: Improve user experience for streaming services by addressing current pain points related to navigation, search, and content accessibility.

2. Features:

  • Standardized Status Bar: Implement a universal status bar for content streaming, showing key information like time left, volume, etc.
  • Improved Search Functionality: Prioritize free content in search results and maintain uniform keyboard layouts across platforms.
  • Enhanced Content Accessibility: Ensure closed captioning settings are retained across sessions, and respect language preferences.
  • User Interaction Improvements: Introduce a ‘Like’ button for content, allow searches by video length, and streamline temporary connections for guest users.
  • Profile Management: Simplify profile switching and ensure settings are maintained per profile.

3. Requirements:

  • Compatibility: Ensure features are compatible across different devices and platforms.
  • User Settings: Allow users to save and easily modify their preferences.
  • Design: Follow a user-friendly design for all new features to ensure easy navigation and interaction.

4. Success Criteria:

  • Increased user engagement and satisfaction.
  • Positive feedback from usability testing.
  • Decrease in user complaints related to navigation and content accessibility.

This document outlines the initial requirements for the proposed enhancements and will be updated as feedback is received and additional research is conducted.

Point 1 is a reasonable summary of the post. Point 2 is a decent regurgitation of the requirements. Point 3 is “mom and apple pie,” and could be cut-and-pasted in to any PRD. I didn’t expect it to generate success criteria, but that is a reasonable output. (Though still mom and apple pie.)

All in all, not a PRD, but a reasonable outline for a junior PM to build off of.

>: what kind of testing do I need to do

For your streaming service enhancements, you should conduct the following types of testing:

  1. Usability Testing: Evaluate the user interface and user experience with real users to identify any issues in navigation, content accessibility, and overall interaction.
  2. Functional Testing: Verify that all features work as intended across different devices and platforms.
  3. Compatibility Testing: Ensure the service works seamlessly on various devices, operating systems, and web browsers.
  4. Performance Testing: Assess the application’s performance, especially during peak usage times, to ensure it can handle high traffic without degradation.
  5. Security Testing: Check for vulnerabilities, especially in user data handling and payment systems.
  6. A/B Testing: Compare different versions of features, like search algorithms or UI layouts, to determine which ones perform better in terms of user engagement and satisfaction.

Again, generic. Mom and apple pie. But it is a solid checklist. I would add accessibility testing.

>: would a focus group be helpful?

Yes, a focus group would be helpful. It allows for in-depth discussions on user experiences, preferences, and feedback on the new features. This qualitative method complements quantitative data from A/B testing and usability testing, providing richer insights into user needs and behaviors.

This is an accurate answer in that it explores the differences between qualitative and quantitative research. But a lot of things don’t benefit from focus group. The biggest breakthroughs will fail in focus groups because people have a hard time conceptualizing drastic changes.

For consumer online products, I rarely find focus groups useful. If I were running a CPG and testing different fragrances for my laundry soap, that’d be different.

>: on which features?

A focus group would be particularly helpful for features such as the standardized status bar, improved search functionality, user interaction improvements like the ‘Like’ button, and profile management. These are areas where subjective experiences, preferences, and ideas can significantly vary among different users, making them ideal topics for in-depth discussions in a focus group setting.

The status bar is best analyzed using A/B testing. Search result quality on usage metrics such as CTR and increased engagement. Profile management is where you’ll get the most bang for the bunk in focus groups.

Partly because it’s very complicated and expensive to implement and partly because there are so many ways people could use profiles.

>: stack rank the features

Here’s a suggested stack ranking for the features based on potential impact on user experience and implementation complexity:

  1. Improved Search Functionality: Directly impacts user ability to find and enjoy content.
  2. Enhanced Content Accessibility: Important for inclusivity and user satisfaction.
  3. Standardized Status Bar: Improves user navigation and control during streaming.
  4. User Interaction Improvements (e.g., ‘Like’ button): Increases engagement and personalization.
  5. Profile Management: Enhances user experience for households with multiple users.
This is a bit of a trick question. Despite it being in vogue, I am not a big fan of stack ranking. It is too hard, and not particularly useful, to stack rank.

In a reasonably large organization, there will be multiple teams. For this example, you’d probably have separate teams for player, search, personalization and partnerships. At best, you’re ranking within a team’s responsibilities.

Many features are grouped in the way they are implemented. Once you’re under-the-hood of the player, it’s easier to make all the player fixes than to make one fix, close the hood, come back, open the hood, make another fix, etc.

Degree of difficulty and time to value delivered are also a factor in prioritization. I could probably knock the time progression change quickly and deliver compelling, immediate value.

Prioritizing tasks is definitely important. But stack ranking, as it is commonly explained, is mostly a futile exercise.

You’re not doing text support right, if…

  • you don’t integrate with your ordering system. Customers should be able to navigate to an order and share that information within the chat. Amazon does a great job with this. Their self-service bot is one of the few that can actually resolve issues. Of the items on this list, this is the toughest to implement, but it’s also one of the most important.
  • don’t “ding” when the agent sends a response. Given how long some tasks and research takes, you need to be able to trigger a system sound that indicates that the agent has responded. Most consumers are multi-tasking, not staring at the chat window. Often the chat window will timeout and the customer will have to restart the chat and provide all of the information form the beginning. It’s a waste of user time and a waste of agent time.
  • don’t have agents read information that was provided, especially when an issue is transferred. The user shouldn’t have to retype and explain the problem. The agent can scroll back and get details, including the research done by the previous agent. This would be hard to do in real-time on a phone call because it’s too time consuming for the second agent to listen. On text, this is simple.
  • don’t give chat agents the authority to do what phone agents can do. Customers don’t want to spend 15 minutes with an agent only to be told they have to call. Then they have to navigate a phone menu designed to deflect them. Once they go through the process, they will have to explain the issue again.
  • don’t allow the user to upload photos. In many cases, a customer sending a picture or a screenshot provides a lot of information. Instead of customer saying “the item had a big dent on the right corner,” she can just upload a picture. I’ve had cases where the agent says “OK, go to Twitter, upload a picture of the item and then return to the chat.” Obviously this doesn’t work for people who don’t have Twitter.
  • get rid of the chat box when the customer switches pages on your site. Often they may be researching orders, alternative items, different flight, etc. They should be able to navigate without losing the chat. American Express keeps their chats persistent. Customers can go back and review chats about previous issues.
  • don’t automatically email me a transcript of the chat. When a chat is completed, the customer should receive an email with the exact conversation. Even better if you highlight the final action the agent is performing.
  • follow up on resolution after a few days. In many cases, I can only immediately comment on the politeness and perceived helpfulness of the agent. I don’t know if the actual issue is resolved until my plane ticket is re-issued, refund shows up on my credit card, item shows up at my door, etc.