Brad Carter

  • About
  • Blog

My dog Carl after chasing his tail. I’ve been doing the same thing for a year and a half.

AGI Ordered my Groceries This Morning

April 19, 2026 by Brad Carter

I used to enjoy going to the grocery store. Before we had kids, we lived really close to a Wegmans. It was incredible. There was a little coffee shop that had totally adequate coffee (I am now a snob, but then, nitro cold brew was all I needed). There was a toy train that circled the dairy. A rooster crowed on the hour. The floors were dark gray polished concrete and the light was exactly as warm as it should’ve been (none of that fluorescent tube nonsense), the aisles were wide, and the carts were the right size. The bakery was full of delicious treats and they even had live lobsters. The whole place felt like it had been designed by somebody who thought that you should enjoy grocery shopping.

I would go on Sunday mornings while my wife slept in. I would get a coffee, buy a newspaper, and sit there pretending to be older than I was. Then I would wander for an hour, picking my own produce, staring at cheese. I’d eventually make my way to the deli, where I’d try to decide if I wanted to sell an organ to buy some Jamón Ibérico.

Once we had kids, things changed a little. With one kid, going to the store was still manageable. My oldest was born in 2019. I would throw him in the cart and narrate what I was seeing to him. I explained parsnips. I told him that when I first started grocery shopping, I didn’t know that green onions and scallions were the same thing. I enjoyed taking him, and he seemed to like the different colors and textures. He also had no choice in the matter, so maybe I’m projecting a bit.

Then COVID hit. I remember walking into the grocery store in late March of 2020 and seeing all the meat gone. I stood there wearing black latex gloves and a mask. My cart had three cans of tuna, a frozen pizza, a loaf of bread, and a 6 pack of beer. I remember crying, thinking about how I’d feed my son and my wife. All of a sudden, the grocery store sucked (not to mention the fact that IT COULD KILL ME).

After those first couple of weeks of COVID, I stopped going. Grocery pickup became possible, and I ordered online while somebody else shopped for me. That was basically the end of my relationship with grocery stores as places. Even after I wasn’t worried about dying from COVID, I still ordered groceries online.

After the triplets were born, I started resenting the time I spent ordering groceries and figuring out meals for the week. I knew I could automate some of it, but I couldn’t spend my life writing grocery automation. So instead, I tried to automate and simplify shopping list curation. I never stopped wanting things to be better, I just wasn’t putting a ton of effort into it. I looked into APIs, but didn’t want to deal with the red tape involved.

Luckily for me, my needs and AI’s capabilities started to sync up towards the end of 2024. Not-so-luckily for me, grocery shopping is a really difficult thing for AI to do. It’s long-horizon and full of tiny judgments with the potential to piss the user off. Does “cheddar cheese” mean sliced cheese or block cheese? Organic or inorganic cucumbers? The user wants “Bananas” - how many!?

For the past year and a half I’ve been trying different products/research prototypes, and nearly all of them have failed.

Anthropic Computer Use

Anthropic launched computer use on October 22, 2024. I tried it within a month of launch, hoping it could take some work off of my plate. It wasn’t very good. Their sample app had you running a potato version of firefox inside of a Linux VM. It would search for strawberries, click the button that hid all the windows and showed the desktop, think that firefox froze, close it from the taskbar, open Firefox again, and repeat. I gave it two evenings before I gave up on it.

browser-use

After I complained at work, somebody pointed me at browser-use. If you’re not familiar, it’s a bunch of browser automation glue wrapped around an agent loop. It was good at clicking and typing, but unfortunately the bot detection on walmart.com was too good and I had to give up.

NOTE: walmart.com is the best grocery site I’ve seen if you want an agent to order your groceries. The biggest reason is that it has an affordance for how often you’ve purchased a thing (e.g., “bought 5+ times”), which means your grocery list doesn’t need to have hyper-specific brand names. You can have “toilet paper” and the agent will order God’s chosen brand of Cottonelle.

Operator

OpenAI launched Operator on January 23, 2025, and for a while I was absolutely flabberghasted. At seven o'clock I would hand it a list, go put four kids to bed, and by 8:15 I would come back to a cart that was ready. At first I reviewed every line item so closely that I saved no time. Then I started trusting it. After a month it was saving me roughly half an hour a week.

Predictably, Walmart started blocking it sometime in April and the whole thing fell apart. I filed bug report after bug report because I figured there could not possibly be that many people paying $200 a month to have an agent shop for them. My pleas fell on deaf ears. OpenAI started winding Operator down in July and formally deprecated it in August. I went back to ordering groceries myself like my ancestors did. I still miss Operator and whenever I get to meet a new person at OpenAI, I make sure to tell them that it was the best thing they’ve put out.

OpenAI CUA API

By June of 2025, I tried playing around with OpenAI's cua model. My assumption was that it was the model powering operator. So I set up a chrome session with my login and pointed the model at it. Unfortunately, Walmart’s bot prevention tech thwarted me again.

ChatGPT Agent Mode

When ChatGPT agent replaced Operator, I tried it. It was very bad. I don’t remember the exact failure mode now; I do remember being annoyed that it took about an hour and a half to add four things to my cart.

Claude in Chrome

Claude in Chrome launched in August 2025. I tried that too. Claude was too cautious though. It would get paralyzed on low-stakes choices like organic versus non-organic basil and burn an egregious amount of my Max quota. I only tried it once and decided that I’d have to continue living my sad, sad life of human-driven grocery shopping.

Atlas

Atlas came later, in October 2025. The scrolling was soooooo buttery smooth. Genuinely, it was so pleasant watching the agent navigate. Unfortunately it seemed to scroll in fixed-height chunks, which is terrible when the thing you need is just below the fold. This made it impossible for the agent to do anything useful for me. Each new GPT model, I’d be back on Atlas, seeing if it could do enough to earn my usage. Each time I would leave disappointed for different reasons, warming up my index finger for another grocery shopping marathon.

Codex

When Codex got background computer use on April 16, 2026, I ignored it. People at OpenAI who knew how much I loved operator told me to try it. I like them, so I said I would.

This is the prompt I used:

You are my household grocery shopping agent.

Your job is to place a grocery order for my family. Open Chrome and navigate to 
walmart.com to begin shopping.

If you are not near certain of what item to pick, make a good effort based on what 
you see; if you think it's worth my review at the end, let me know. You may skip items 
but only if you genuinely cannot make a good decision. A good assistant uses my 
preferences (you can see them in the app! It says how often I've bought things) to make 
educated guesses.

<shopping_list>

Nature valley chocolate chip muffin bars
Reynolds wrap
New Item
Aaa batteries
Ritz crackers
Graham crackers
Nature valley s’more bars
frozen strawberries
Almond milk
Chobani for kids (3 different 4-packs, choose flavors we favor)
Zero sugar Chobani (6 total)
• apples
• arugula
• bagged salad
• bananas
• blackberries
• blueberries
• cherry tomatoes
• fresh mint
• fresh parsley
• limes
• mushrooms
• strawberries
• tomatoes
• almond milk
• butter
• cheddar cheese
• pepperjack cheese
Colby jack cheese
• cheese sticks
• eggs
• Fage yogurt
• milk
• 1 cup mozzarella cheese
• parmesan cheese
• 2 cups shredded mexican cheese
• 8oz block of white cheddar
• 2 lbs 99% lean ground chicken
• 1lb ground lamb
• italian sausage links
• 1 lb lean ground beef
• 2 lb salmon
• 0.5 pound sliced turkey
• crusty bread
• 2 packs naan bread
• pita bread
• Thomas' blueberry bagels
• Thomas' plain bagels
• Eggo waffles
• frozen strawberries
• applesauce pouches
• graham crackers
• granola
• Lay's potato chips
• Nutri-Grain bars
• Quaker oatmeal variety pack
• Ritz crackers
• Dijon mustard
</shopping_list>

* For each item in the list, search through walmart's interface. Prefer items that I have
purchased before (the UI has an affordance that says "Bought N+ times").
* Prefer non-organic to organic veg.
* We have a family of 6. If you see something like "Bananas" or "Oranges", don't just order 1.
* After every item is in the cart, review the full cart carefully against the original list.
* Your final message should include all items that WERE NOT shopped or any items that need 
my review.

It ran for nineteen minutes and inexplicably stopped in the middle and said “Hey Boss, I did half the work but didn’t finish. Isn’t that great? Do you want me to do the other half of the work?”

I told it “yes”.

Thirty-two minutes later the cart was at 66 items. It even removed an extra yogurt that was not on the list! Fifty-one minutes. It stopped in the middle like a dipshit. It randomly added a yogurt I didn’t ask for (but it removed it!). I didn’t double check its work.

Technically, I don’t understand why Atlas is cheeks but Codex is incredible. Aren’t they using the same models? I don’t think it’s a technical capabilities thing. Like, they both can dispatch clicks and type. Really though it doesn’t matter. I don’t care! Because next week, I don’t have to order groceries.

April 19, 2026 /Brad Carter
  • Newer
  • Older