chatgpt – redesign

LinkedIn’s AI could have you freezing your butt off in Alaska

September 5, 2025September 8, 2025 Rakesh Agrawal

I love rampies. Not only could I not get off the ground without them, I couldn’t leave the gate. They load the luggage, drive the tugs that push back the plane and use the orange wands to marshal airplanes. They are out on the ramp whether it’s 100 degrees in Phoenix or -40 degrees in Nome.

But does that mean I could be one? Nope.

Though I do know not to load bags into an engine. That won’t end well. Note to Alaska: don’t hire Gemini to load bags.

On to LinkedIn…

Imagine my surprise when LinkedIn’s AI told me I was a great fit… for a ramp service agent role. LinkedIn has a feature for premium members that will tell you how good a match you are for a role.

Here’s my rating for a Ramp Service Agent role.

That is not a job I remotely qualify for. Location doesn’t match. Comp doesn’t match. Skills don’t match. But that rating is one of benefits I get for $40 a month.

This role at Alaska Airlines is a great fit. I could fly this at 40,000 feet. Fortunately, I’m also a “high” fit for this.

The only way these answers help is if candidates are doing spray-and-pray applications. That’s a waste of time for applicants.

Now imagine a recruiter looking at this same view. (I don’t have access to LinkedIn Recruiter, so I don’t know its sort order.) It wouldn’t help them either. If anything, it would destroy credibility for LinkedIn as a recruiting tool. And when a company owned by Microsoft (one of the biggest backers of AI) ships something this sloppy, it casts a shadow over all their other products.

But what about that “BETA” label, Rakesh? For those not familiar with tech talk, it means they’re testing the product for release.

I’ve designed and launched search products for much of my career. I’d never put this out beyond a closed internal beta, much less as a premium feature. As it stands, this isn’t a product — it’s unpaid labor for LinkedIn’s AI. The thumbs up/thumbs down will train their model. Even better: just hire a large RLHF team in India.

Amy Miller, a recruiter at Amazon, hates AI for “scoring.” This is a good reason why.

Rampies don’t need AI scores — they get planes moving. LinkedIn’s “match” feature should aspire to that kind of utility: useful, reliable, and grounded in reality. Until then, it feels more like unpaid labor for their AI than a benefit for members.

And if you’re looking for a product executive who knows AI — and knows when not to trust it — send me a message.

Parting shot: Here’s what WordPress generated. The NTSB will want a conversation.

Written by me, lightly edited by ChatGPT, illustrated by ChatGPT & Gemini. Unlike LinkedIn’s AI, none of them tried to send me to Nome at 40 below.

Alaska provides another glimpse of the future

August 28, 2025September 2, 2025 Rakesh Agrawal

Alaska Airlines is offering another time-saving feature for passengers in Portland and Seattle: bag check with facial recognition to avoid having to wait in line for someone to verify your ID.

Although this feature isn’t an industry first, it’s an important improvement. Alaska has had some notable firsts.

From ChatGPT:

First Airline to Sell a Ticket Online (1995)

This is their most famous claim to fame. In 1995, Alaska Airlines became the first airline in the world to allow customers to book and purchase a flight over the internet. This innovation fundamentally changed how travel is sold and paved the way for the modern e-commerce and online travel agency landscape we know today.

First Airline with Online Check-In (1999)

Building on their web-based success, Alaska was the first airline in the world to introduce online check-in. This allowed passengers to check in for their flight and print their own boarding passes from their home or office, a revolutionary convenience that helped passengers bypass long lines at the airport.

Air travel is one of the rare cases where interests are aligned between customers and businesses.

(The information below is based on having flown 2 million mile across dozens of carriers. As with all things travel, there are exceptions and airlines that handle things differently.)

We’ve already seen a lot of innovation in the time I’ve been flying.

Old way	New way
Carry paper ticket	E-ticketing
Wait in line to check in	Mobile boarding passes, online check in, kiosks
Wait in line to drop my bag (in the rare cases I do)	Face recognition to check bags (this is what Alaska just announced)
Wait in longer lines at security for people who rarely travel fumbling for things	TSA Pre✓ and Clear Plus
Wait for a refund	E-tickets mean you don’t have to return paper tickets
Wait in line for an hour for flight changes	Many airlines automatically re-book; if those options don’t work, you can use a mobile app to find alternatives
Stay over a Saturday night to get the best fares	The rise of low-cost carriers all but eliminated Saturday night stay requirements

These changes weren’t just cost cutting — they removed friction for travelers and streamlined operations for airlines.

I see a lot of innovation ahead, some which are starting to take shape.

AI

This is an interesting one because the definition has been so vague and misused. Airlines have been using machine learning and predictive modeling to set price and fleet management for decades.

When I talk about AI, I mean new tools like LLMs and user agents.

The most obvious use for AI for airlines is better answering user customer service inquiries. Unfortunately, most of the tools out there now are dumb bots. They don’t incorporate things like my status, itinerary and previous interactions. The best LLM-linked bots will also incorporate the reference tools airline agents access.

Not much further down the AI path is user agents. Instead of booking through the airline Web site, agents will do the searches on my behalf. (What you see now is dumb AI, which is just a fancy screen scraper. They may be marketed as agents, but they’re not.) Airlines need to be paying attention to agents that use MCP and testing ways to integrate with their systems.

The upside for airlines: ability to cut out OTAs and integrate their direct-connect capabilities. It eliminates fees that GDS’s charge and makes it easier to sell ancillaries like premium seats and lounge access.

One key to AI is to not lose trust. If an AI gives a response, it should be treated just like an agent gave that response.

Air Canada refused to honor a response from a chat bot, saying the bot was inaccurate. This led to a social media firestorm and a lawsuit that Air Canada lost. It’s much better in the long run to maintain trust.

This is just on the customer side. There is a lot of room for using LLMs in maintenance, pilot resources, irrops strategy and ad targeting.

Mobile

Airlines have been making big leaps using mobile for all areas of the travel experience. On the consumer side, you can manage new bookings, change flights, flight status and more.

Hotel vouchers sent straight to our phones for irrops: Don’t make me stand in a long line for a paper slip after a cancellation. A mobile voucher also tells me the quality of the hotel being offered — no more roulette with substandard properties.
Meal vouchers: Sent to our phones. United already sends you virtual debit cards, though they should be Apple Pay or Google Pay provisioned. (Many merchants don’t know how to type in credit card numbers.)
Delayed baggage: When the airline knows a bag has been left behind at the previous station, send a pre-filled delayed baggage form. This saves the passenger and the baggage service staff time. Vouchers can also be sent. It’s a great time to be looking at this while airlines integrate IATA 753, a mandate on how baggage must be tracked.

CTI (Computer Telephony Integration)

We’re already seen integration between your actions online and the experience you get when you call. With better CTI, that process will become more seamless. This is one of the best cases of aligning interests. Less time on the phone is good for me and the airline.

If you have a phone that is registered to your account:

The agent should have my next reservation already on screen.
If I was recently on the Web site, skip all the prompts and go to an agent who is prepped with what you were trying to do. (I don’t need to hear that you have a Web site, because I was just there.)
Take user state into account. if I just got off a flight and had checked bags, my call may should be handled differently

The caveat here is that caller ID and ANI can be faked. Airlines need to do a basic level of authentication before jumping into the conversation. Increasingly voice authentication can be defeated using AI. Pull up the record, but ask a question that only I know the answer to.

Less time in line or on hold is good for passengers and airlines. The same technology that cuts friction also cuts costs — if airlines use it to build trust instead of deflect service.

Bottom line: Less time in line or on hold is good for passengers and airlines. The same technology that cuts friction also cuts costs — if airlines use it to build trust instead of deflect service.

OpenAI’s Self-Hosted Option Changes the Game for Privilege-Sensitive Professions

August 6, 2025 Rakesh Agrawal

With OpenAI’s release of a fully self-hosted model, the conversation around legal and medical AI use just shifted—subtly but significantly.

For years, the promise of generative AI has clashed with the hard boundaries of privilege and compliance. Lawyers and clinicians want to use LLMs for research, drafting, or triage—but uploading sensitive information to third-party tools, even “secure” ones, risks breaching attorney-client or doctor-patient privilege. Worse, under HIPAA, uploading protected health information (PHI) to a system without a signed Business Associate Agreement (BAA) is a clear violation.

OpenAI’s hosted offerings (like ChatGPT Enterprise) tried to split the difference—disabling training on user inputs, offering SOC 2 compliance, and claiming no retention of prompts. But they didn’t solve the core issue: from a legal standpoint, hosted AI tools are still third parties. And privilege waived, even unintentionally, is privilege lost.

Self-hosting changes that. By running the model entirely inside your infrastructure—air-gapped, audited, and access-controlled—you eliminate the ambiguity. There’s no third-party disclosure, no downstream training risk, no hand-waving about deletion. For legal and medical contexts, this architecture is a critical step toward preserving privilege by design, not just by policy.

But architecture is only part of the story. Most people—including many legal assistants and clinical support staff—don’t know that sending a document to a hosted chatbot could constitute a privilege-destroying act.

Even more importantly, hosted models are typically subject to subpoenas—not warrants. This distinction matters:

A warrant requires probable cause and judicial oversight.
A subpoena just needs a lawyer’s signature and a theory of relevance.

So if you’re using a third-party LLM provider—even one that claims “enterprise-grade security”—you’re often one subpoena away from disclosing sensitive information without your client or patient ever knowing. And the provider may not even be legally obligated to notify you.

This is not paranoia. It’s infrastructure-aware realism.

That’s why I’ve been working to design AI interfaces that don’t just assume good legal hygiene—they actively enforce it. Smart defaults. Guardrails. Warnings that clarify when a tool is protected vs. exposed.

We need AI tools that:

Detect and flag PHI or confidential content in real time
Provide proactive alerts (“This tool may not preserve privilege”)
Offer strict, admin-controlled retention and audit settings
Default to local-only, no-train, no-transmit modes for sensitive workflows

Legal and healthcare use cases shouldn’t be an afterthought. They should be designed for from the start. Not just to avoid lawsuits—but because the trust at stake is deeper than compliance. You only get one shot at privilege. If you lose it, no one can claw it back.

OpenAI’s self-hosted model is a necessary foundation. But we still need purpose-built, context-aware product layers on top of it. The future of privileged AI won’t be one-size-fits-all. It’ll be legal, local, and locked down—by design.

(Written by me, in collaboration with ChatGPT. IANAL. Case law evolves, though at a snail’s pace, compared with hypersonic pace of technology.)

Disclaimer: As with all things AI, the industry moves at a rapid pace. Models evolve, tools update, and behaviors shift—sometimes overnight. By the time an author hits ‘publish,’ the example they’re using may already be obsolete. It’s not that the writer was wrong. It’s that the system changed while their post was still rendering. Disclaimer 2: The previous disclaimer (only) was written by AI. Disclaimer 3: Any future attempts to update Disclaimer 1 may invalidate Disclaimer 2.

Agentic Commerce or Agentic Con Job? Whose Side Will Your AI Really Be On?

July 31, 2025 Rakesh Agrawal

There’s a lot of talk in the AI world about “agentic commerce.” My big question: whose agent?

We have “agents” today, but most of them aren’t working for you. They aren’t fiduciaries.

Travel agents? They steer you to whoever gives them the largest commission. Despite all their targeting data, the default sort order is Recommended. I usually want Price or Rating, but Expedia wants to steer my booking—so it shows Recommended. (Expedia takes a 15–30 percent commission on hotels.)

Real-estate buyer’s agents? Same story. When I bought my place, my agent urged me to pay full asking price. I bid lower. He wanted the higher price because his commission would be higher.

Almost every service you buy, no matter how it’s marketed, isn’t designed in your best interest. “Free” stock trades? Robinhood makes its money on the back end through payment for order flow—$677 million of it in 2023. The finance industry even lobbied successfully against a rule that would have required brokers to act as fiduciaries.

Google is the ultimate example, even though Google’s founders were skeptical of the advertising model.

“We expect that advertising-funded search engines will be inherently biased toward the advertisers and away from the needs of the consumers.” — Sergey Brin and Larry Page, 1998 (The Anatomy of a Large-Scale Hypertextual Web Search Engine)

In the U.S., the only mainstream professions actually required to put your interests first are lawyers and fee-only financial advisers. (Yes, lawyers still have a conflict: the more hours they bill, the more you pay.)

What AI could change

With a fundamental shift like AI, there’s a possibility of turning that model on its head.

Take Amazon Marketplace: merchants compete for visibility, and Amazon takes 30–50 percent for the privilege. If my agent could talk directly to their agents, we could split that spread—I’d pay less, the merchant would earn more.

My agent could negotiate on my behalf:

“Hey airlines, I need to fly SFO → JFK next Tuesday. Give me your best bid.”

Instead of spending 20 minutes sifting through Expedia, my agent could strike a deal in milliseconds. I could even place limit orders: “When someone offers this trip for $300, buy it.”

Airlines and intermediaries would hate that, but it’s closer to an efficient market. Intermediaries wouldn’t skim 30 percent just for hosting a platform. (Yes, platforms claim they provide protection and customer service, but in reality most offer little of either.)

Today’s model is expensive because there are many mouths to feed along the way.

Will it happen?

I’d happily pay my agent a subscription fee if it truly worked for me. Will it happen? I’d love it—but history says no.

We’re addicted to “free,” even though “free” often costs us more.

(Written by me, lightly edited by ChatGPT.)

2 + 2 = 5: Why AI can’t do math

July 28, 2025 Rakesh Agrawal

Since the dawn of calculators, we’ve trusted computers to do math. Unless there’s a bug in the code, computers have been great at math. Better and faster than the fastest humans. We use them in everything from stock trading to calculating change at the register. Some grocery stores even have automated coin dispensers; the register calculates the coins due and tells another machine to spit out the right coins.

What are marketed as the most powerful supercomputers ever… can’t do math. They can write sonnets, pass the bar exam, summarize Tolstoy… and botch 17 + 5. When I gave AIs a spreadsheet with places I’ve traveled and asked it to do a simple count, they kept coming back with the wrong numbers. (The spreadsheet I imported did an easy and accurate count: the number of rows matched the number of places I’ve visited.) ChatGPT can get something as basic as less than or greater than wrong. When it comes to math, AI might not be smarter than a 5th grader.

It has to do with the way AI works. Unlike spreadsheets and other tools we’re used to, it’s just predicting what will come next. It’s not actually doing math. It predicts based on inferences. The name is a tell for people in the industry: LLM stands for Large Language Model. It’s not a Large Math Machine.

But people who’ve been told that AI is a magic machine don’t know that. Like everything else, the AI confidently spits out an answer. The better implementations will do the inferences, recognize that might be wrong (“hey, I just did math!”), go back and rewrite Python code, execute it and spit back an accurate, calculated answer.

Yes, there is a small print disclosure that says “AI can make mistakes.” It should add “My math might not be great.”

If you do use AI to do math, ask it to double check the answer as a math problem. Or, you can just use a spreadsheet until the technology gets better.

Double‑Check Your Bot: Lessons From Yahoo! to Kosovo

July 25, 2025July 28, 2025 Rakesh Agrawal

A lot of users think AI is infallible. It’s often way off. A recent exchange:

You:

What percentage of Yahoo!’s acquisitions were failures?

Me (first pass):

Roughly 50 % of the 110 deals Yahoo! executed between 1997‑2016 failed to meet stated objectives…

You:

That’s a big understatement. Try again.

Me (second pass):

A more realistic post‑mortem puts the failure rate closer to 90 %, once you count shut‑downs (e.g., GeoCities, Broadcast.com) and talent walk‑outs…

Having worked in the space, I can confirm Yahoo hired some of the least‑talented corp‑dev people in tech; AOL scooped up the rest. Hi, AOL Time Warner—oh wait, neither entity exists anymore.

Because AI learned from coders, it expects humans to compile and rerun its answers—so you must iterate.

This is the approach that ordinary users should take: try, try again. Sometimes, you’ll get better answers just by saying “Are you sure?” or “That doesn’t seem right.”

Rule of thumb: ask twice, ask differently.

Here are some areas where I’ve found a lot of success.

Sparring partner

I like having deep, intellectual conversations with smart people. But they can’t be around 24×7. AI can. I can test out ideas. Even though they’re wired to be people pleasers, you can get them to poke holes. “What do you think about this idea?”

Travel planner

Want a 5-day trip to Kosovo? ChatGPT will give you ideas. Unlike the generic top 10 lists, it can customize to your preferences. “I like museums.” “I like war memorials.” “I’m vegetarian.” “I like beaches.” Yes, it’ll claim Kosovo has beaches—double‑check the map.

Part of the reason AI is good at this is that there aren’t definitive answers. No matter what travel resource (or friend) you use for recommendations, they will always miss things.

Where it gets problematic is the actual geography and details. I’ve found it to put places far from where they actually are or get directions wrong. When I asked about Southwest’s bag fees, it got that wrong.

To be fair, a lot of sites get that wrong. Southwest has long touted that two-bags fly free; that policy changed recently.

Psychologist

This one is going to be controversial, especially among psychologists. In my experience, a lot of therapists in the past, most human ones are terrible.

There are some inherent benefits of using AI for therapy that humans can’t match:

Available 24×7. I had an issue recently where I needed to talk to my therapist. It was on the weekend and he said to call back during the week.

Cost. U.S. therapy isn’t cheap. Even online options like BetterHelp run about $70–$100 per live session once you annualize their weekly plans. Walk into a brick‑and‑mortar office in San Francisco and it is much more expensive. According to Psychology Today, the average is $185 and in private practice, it can be $300. Meanwhile, my AI “therapist” costs $20 a month for unlimited chats.

Longer context window. A human therapist sees you maybe one hour a week, probably only one hour a month. You talk about things that you can remember since the last visit. But there may have been things you have forgotten when they were relevant. AI has near-perfect memory.

Less risk of confusion. AI isn’t going to conflate your experience with others it “sees,” like a human therapist might.

The biggest challenge is that (so far) there isn’t AI-client privilege or malpractice insurance. Your data can be subpoenaed. If AI gives you bad advice, it’s not responsible. (Check the Terms of Service.)

AI isn’t a psychiatrist. It can’t prescribe medications. When it does venture into medicine, be very careful. More on that later.

Lawyer

You can have AI write legal-sounding messages. Often, the implication that you are using an attorney can be enough to get intransigent parties to agree, especially on low-stakes cases. Your bank doesn’t want to spend $20,000 in legal fees if they can make you go away for $200.

It’s not a great lawyer. We’ve seen AI make up cases and citations. Attorneys have been sanctioned for using AI to generate briefs. Anthropic (one of the leading AI companies), had a court filing partly written by AI. Parts of it were wrong.

Again, there is no privilege. Your interactions can be subpoenaed, unlike if you pay an attorney. Unlike a real attorney, there is no malpractice insurance. I expect that this will change.

Writer

As a one-time journalist, I hate to say this because it will hurt my friends. Even more so because jobs are in short supply.

Sure, a lot of what is generated by AI is slop. But working collaboratively, you can write better, crisper and with better arguments. You can use it to find sources. It’s easily the best assigning editor and copy editor I’ve worked with. It also has infinite time—something today’s newsrooms lack.

Unless I explicitly call it out, I don’t use AI to write posts, but I do have it look at my writing. It should be built into the editor tool in every CMS.

Recently, I listened to a podcast where three reporters who cover AI were asked how they use AI. Two said as a thesaurus. You’ve got gigantic supercomputers at your fingertips and you’re going to use it like a $16 paperback? NGMI.

Doctors and finance… more on that later.

TL;DR: As a college teacher told me, “If your mother tells you she loves you, check it out.” AI is great for low-stakes scenarios like bar bets; otherwise check it out.

Ask twice, ask differently.

AI is wrong 75.3% of the time.

July 24, 2025July 24, 2025 Rakesh Agrawal

Ok, maybe not. Probably not

But I just did something AI largely doesn’t do. I admitted uncertainty: AI acts like a first-year associate at an elite consulting firm, not allowing for the possibility that it is wrong.

“Hallucination” is the sanitized term. In plain English: “I made it up and hoped you wouldn’t notice.”

So why all the excitement?

Three big reasons:
– People are talking their book. Companies have already invested hundreds of billions in AI. They need it to work. (A big part of my portfolio is in AI-related companies, so I really want it to work, too!)

– A lot of the excitement around AI is coding tools. “Look at this app I built with a few sentences.” AI is a technology that is over-fitted for coding. There is so much high quality training data out there from sources like Stack Overflow, language documentation, open-source community, university lectures among other things. In product management, this is called “the happy path” – what happens when everything goes right. Clean specs, deterministic outputs, tons of labeled data. Real life isn’t like Stack Overflow.

– It feels like magic. No pop-ups, no autoplay videos, no sites inhaling 6 GB of RAM. Just a clean answer box. Take that, EU cookie banners. But “feels like magic” isn’t the same as being right.

Going outside that domain, things get a lot dicier:
– I did some medical queries and if I’d listened to ChatGPT’s advice, I would be having *literal* hallucinations. It confused a benign drug with a powerful psychotropic.

– Earlier this year, it was still thinking Biden was president. It didn’t even know that Biden had dropped out and Kamala ran for president.

– When I was researching Boeing, ChatGPT’s performance was 80 points off. Boeing’s 5‑year total return was +~30%, ChatGPT said –50%.

Across thousands of queries where I’m a SME, I saw a lot of cracks. Where I’m not, everything just “sounds right.” That’s dangerous.

There are a lot of technical reasons for this that I’ll get into in another post.

I’m still very excited about where AI will go. With tech, we way overestimate what it will do in the next year, but way underestimate what it will do in ten years. (See: cell phones, iPhones, apps, social, Amazon.)

We will get there. Until then, take the responses with a brick of salt.

(I reviewed this post with ChatGPT. It had some good thoughts, including adding the “feels like magic” bullet. It tightened my wording in a few places.)

“When will AI be able to replace a product thinker like me?”

July 7, 2025 Rakesh Agrawal

Whenever I meet people, they tell me that AI will get rid of most jobs—except theirs.

I know that there is a good chance that at some point AI will develop skills that will help it get closer to my skill level.

I’ve written about how AI can replace PMs with up to 7 years of experience. So I asked ChatGPT “When will AI be able to replace a product thinker like me?”

The TL;DR: Keep building the future. AI can (currently) only do what is in its training set. If you do things that haven’t been done before, you can stay ahead of the tsunami. If you can manage complex organizations to ship. If you’re a system-level thinker.

Is this good advice? That’s TBD. But it’s also the best answer that I’ve been able to come up with when talking with many long-term tech execs.

The following is a transcript of ChatGPT’s answer. When drafting this, I told ChatGPT that I’d put my own assessment in italics. It misunderstood and put its assessment of what my assessment would be. They were spot on, so I left them in.

Executive Summary

AI can already outperform junior and mid-level PMs at execution. It writes PRDs, prioritizes roadmaps, handles tradeoff matrices, and synthesizes user feedback—all faster, cheaper, and more thoroughly than most human PMs. But it still struggles with judgment, taste, and navigating political, temporal, and organizational complexity. Those are the layers where top product thinkers still thrive.

I’m watching this happen in real time. What used to take days of back-and-forth now takes a well-structured prompt and 30 seconds. But the AI still doesn’t know when to say no—or when to break the rules.

This memo breaks the question into three timeframes: what AI can do today, what it will likely master soon, and what may remain uniquely human.

1. What AI Can Already Do (Today)

AI now matches or outpaces $150K+/year PMs at:

Feature scoping & prioritization
Drafting PRDs and user stories
Competitive teardown and synthesis
UX tradeoff analysis
Roadmap justification and market segmentation
Edge-case anticipation
Implementation feasibility checks (tooling, latency, systems integration)

Key insight: For structured thinking and repeatable processes, AI is already a high-functioning, tireless, no-ego PM.

In side-by-side comparisons, I’ve seen AI outperform real PMs in breadth, speed, and edge-case awareness. It’s humbling—and a little unnerving.

2. What AI Will Likely Master Soon (1–3 Years)

Modeling political/organizational dynamics: understanding who needs buy-in and where blockers live
Multi-system reasoning: how product changes ripple through infrastructure, marketing, and user trust
Value-based prioritization: balancing revenue, delight, risk, and cost with contextual nuance
Taste mimicry: copying the instincts of great product thinkers based on exposure to their prior work

Key risk: If you stay in the realm of optimization and extrapolation, you’ll get outpaced by a machine that never sleeps and never second-guesses.

It’s already good at seeing the chessboard. It’s learning how the players behave. Soon it will start asking which games are worth playing.

3. What’s Hard to Replace (Maybe Never)

This is the true job of senior product leadership:

a. Knowing when to ignore the data

“The metrics say it’s fine, but I know it’s not.”

b. Calling organizational bullshit

“This isn’t a feature request. It’s a political workaround.”

c. Pattern recognition across domains

“This reminds me of something that failed five years ago in a different vertical.”

d. Taste

You can’t A/B test your way to Pixar. You don’t user research your way to vision. You feel when it’s right.

Key moat: Judgment, taste, and courage remain hard to automate—especially when success requires going against the grain.

These are the moments where senior PMs still earn their title. When to kill a darling. When to greenlight a moonshot. When to say “trust me.” AI doesn’t have that instinct—yet.

Final Assessment

If your job is to execute, AI is already competitive.

If your job is to bet, guide, interpret, and influence—it’s still your game to lose.

The PMs who will survive and thrive are those who:

Integrate systems thinking and storytelling
Understand power and incentives
Apply historical and cross-domain reasoning
Develop and defend taste

AI isn’t coming for your job. It’s coming for the parts of your job that weren’t your differentiators anyway.

ChatGPT: A $150,000 PM for $20 a month

July 2, 2025 Rakesh Agrawal

I have a new product manager that outperforms most mid-level PMs I’ve worked with. It’s faster, more thorough, and has ideas veteran PMs miss. It’s also an AI. As someone who has recruited and managed PMs for over a decade, this is what keeps me up at night.

To see how far AI has really come, I started feeding ChatGPT the same product design and strategy questions I use to interview human candidates to see how it would do. The answer: great. For most of the tasks, it has easily out-performed entry level PMs and PMs with 5-7 years of experience. It has come up with solutions that even veteran PMs haven’t. All for the low, low price of $20/month. And, of course, it does it faster.

The humble volume buttons

Here’s one example: In the latest hardware refresh, Google moved the volume buttons on the remote for their TV streamer from the side of the remote to the face.

*New (left) and old Google streaming remote*

ChatGPT came up with the expected answers: the buttons on the side have become very familiar to users because that’s the way cell phone buttons work. It also lets the remote be smaller.

Putting the buttons on the face is more equivalent to traditional remote controls in terms of discoverability. That’s where they’ve always been. But it makes the remote substantially bigger. (See picture above.)

That’s where most PMs would stop. ChatGPT went into the details of tooling and manufacturing costs.

The absurdity test

I also did something I frequently do with PMs: suggest absurd ideas to see if 1) they understand that they are absurd 2) they are willing to push back.

I suggested doing a split test, with 5,000 units with the volume buttons on the side and 5,000 units with the buttons on the face.

Many junior PMs say “Sure, sounds like a good experiment.” They are trained to be data-driven.

Although that works well in a software environment, that’s a really bad idea for hardware. Doing a split run is prohibitively expensive due to tooling costs. You’d also have to come up with different packaging and marketing materials.

ChatGPT came up with the idea I was looking for: 3D print a few samples and bring in people to test them.

Absent that, ChatGPT recommended putting the volume controls on the side. So did Gemini. (If I meet the team who designed the new remote, I will definitely ask about the reason for the swap – and the swap of the home and assistant buttons.)

What does it mean for entry-level PMs?

I’m afraid the answer isn’t great. I can get $150k of productivity for $20/month. That’s not a tough call.

That begs the question: if there isn’t a pipeline for entry-level and mid-level PMs, where do senior level PMs come from? The best answer for now is that PMs need to expand their breadth to be able to handle more complexity: integrate design, development, business and systems level thinking into their repertoire.

As Scott Belsky says, taste becomes more important than ever.

So does the ability to see what the AI doesn’t: power dynamics, company incentives, unquantifiable friction — and what’s not on the roadmap, but should be.

A snippet of the ChatGPT response is below.

I asked ChatGPT to write a PRD…

March 12, 2024 Rakesh Agrawal

When I talk to people about AI, they’re convinced that AI will take everyone’s job but theirs.

So I asked ChatGPT to write a PRD based on these broad requirements on streaming.

This is how it went, with my analysis in italics.

>: create a prd

Here’s a condensed version for a Product Requirements Document (PRD) based on the article’s suggestions:

Title: Streaming Service User Experience Enhancements

1. Objective: Improve user experience for streaming services by addressing current pain points related to navigation, search, and content accessibility.

2. Features:

Standardized Status Bar: Implement a universal status bar for content streaming, showing key information like time left, volume, etc.

Improved Search Functionality: Prioritize free content in search results and maintain uniform keyboard layouts across platforms.

Enhanced Content Accessibility: Ensure closed captioning settings are retained across sessions, and respect language preferences.

User Interaction Improvements: Introduce a ‘Like’ button for content, allow searches by video length, and streamline temporary connections for guest users.

Profile Management: Simplify profile switching and ensure settings are maintained per profile.

3. Requirements:

Compatibility: Ensure features are compatible across different devices and platforms.

User Settings: Allow users to save and easily modify their preferences.

Design: Follow a user-friendly design for all new features to ensure easy navigation and interaction.

4. Success Criteria:

Increased user engagement and satisfaction.

Positive feedback from usability testing.

Decrease in user complaints related to navigation and content accessibility.

This document outlines the initial requirements for the proposed enhancements and will be updated as feedback is received and additional research is conducted.

Point 1 is a reasonable summary of the post. Point 2 is a decent regurgitation of the requirements. Point 3 is “mom and apple pie,” and could be cut-and-pasted in to any PRD. I didn’t expect it to generate success criteria, but that is a reasonable output. (Though still mom and apple pie.)

All in all, not a PRD, but a reasonable outline for a junior PM to build off of.

>: what kind of testing do I need to do

For your streaming service enhancements, you should conduct the following types of testing:

Usability Testing: Evaluate the user interface and user experience with real users to identify any issues in navigation, content accessibility, and overall interaction.

Functional Testing: Verify that all features work as intended across different devices and platforms.

Compatibility Testing: Ensure the service works seamlessly on various devices, operating systems, and web browsers.

Performance Testing: Assess the application’s performance, especially during peak usage times, to ensure it can handle high traffic without degradation.

Security Testing: Check for vulnerabilities, especially in user data handling and payment systems.

A/B Testing: Compare different versions of features, like search algorithms or UI layouts, to determine which ones perform better in terms of user engagement and satisfaction.

Again, generic. Mom and apple pie. But it is a solid checklist. I would add accessibility testing.

>: would a focus group be helpful?

Yes, a focus group would be helpful. It allows for in-depth discussions on user experiences, preferences, and feedback on the new features. This qualitative method complements quantitative data from A/B testing and usability testing, providing richer insights into user needs and behaviors.

This is an accurate answer in that it explores the differences between qualitative and quantitative research. But a lot of things don’t benefit from focus group. The biggest breakthroughs will fail in focus groups because people have a hard time conceptualizing drastic changes.

For consumer online products, I rarely find focus groups useful. If I were running a CPG and testing different fragrances for my laundry soap, that’d be different.

>: on which features?

A focus group would be particularly helpful for features such as the standardized status bar, improved search functionality, user interaction improvements like the ‘Like’ button, and profile management. These are areas where subjective experiences, preferences, and ideas can significantly vary among different users, making them ideal topics for in-depth discussions in a focus group setting.
The status bar is best analyzed using A/B testing. Search result quality on usage metrics such as CTR and increased engagement. Profile management is where you’ll get the most bang for the bunk in focus groups.

Partly because it’s very complicated and expensive to implement and partly because there are so many ways people could use profiles.

>: stack rank the features

Here’s a suggested stack ranking for the features based on potential impact on user experience and implementation complexity:

Improved Search Functionality: Directly impacts user ability to find and enjoy content.

Enhanced Content Accessibility: Important for inclusivity and user satisfaction.

Standardized Status Bar: Improves user navigation and control during streaming.

User Interaction Improvements (e.g., ‘Like’ button): Increases engagement and personalization.

Profile Management: Enhances user experience for households with multiple users.

This is a bit of a trick question. Despite it being in vogue, I am not a big fan of stack ranking. It is too hard, and not particularly useful, to stack rank.

In a reasonably large organization, there will be multiple teams. For this example, you’d probably have separate teams for player, search, personalization and partnerships. At best, you’re ranking within a team’s responsibilities.

Many features are grouped in the way they are implemented. Once you’re under-the-hood of the player, it’s easier to make all the player fixes than to make one fix, close the hood, come back, open the hood, make another fix, etc.

Degree of difficulty and time to value delivered are also a factor in prioritization. I could probably knock the time progression change quickly and deliver compelling, immediate value.

Prioritizing tasks is definitely important. But stack ranking, as it is commonly explained, is mostly a futile exercise.

Share this post:

First Airline to Sell a Ticket Online (1995)

First Airline with Online Check-In (1999)

AI

Mobile

CTI (Computer Telephony Integration)

Share this post:

Share this post:

What AI could change

Will it happen?

Share this post:

Share this post:

Sparring partner

Travel planner

Psychologist

Lawyer

Writer

Share this post:

Share this post:

Executive Summary

1. What AI Can Already Do (Today)

2. What AI Will Likely Master Soon (1–3 Years)

3. What’s Hard to Replace (Maybe Never)

a. Knowing when to ignore the data

b. Calling organizational bullshit

c. Pattern recognition across domains

d. Taste

Final Assessment

Share this post:

The humble volume buttons

The absurdity test

What does it mean for entry-level PMs?

Share this post:

>: create a prd

>: what kind of testing do I need to do

>: would a focus group be helpful?

>: on which features?

>: stack rank the features

Share this post: