TL:DR; I built Macroscanner, an AI-powered nutrition tracker that estimates food macronutrients from photos. Through the process, I learned that market research is critical building a product, start with a product and build features, and that simplicity is better at first.
The problem
Calorie tracking is something I try to do every day so I constantly have a pulse on my health and can ensure I’m on target with my fitness goals.
Something I’ve noticed over the past two years doing this is that tracking meals or snacks during consumption them can sometimes be inconvenient, impolite, or distracting. Often times, I’ll be out with friends or family and I don’t want to be the guy who’s on his phone the whole time.
When tracking your meal after the fact, however, it’s harder to remember what you ate, and your ability to estimate quantities accurately diminishes with each passing hour.
My solution was to snap a quick photo of what I eat anytime I don’t want to track it in the app right then. Before bed, or when I wake up the next day, I’ll try to estimate my intake based on the photos in my camera roll and log it retrospectively in my favorite nutrition tracker, MacroFactor.
Over the last 6 months of doing this, I’d often go long stretches (up to 2-3 days) where I’m not logging anything or checking the app at all—just taking photos. It makes it harder to stick to my goal this way, since I’m eating based on mental notes of what I’ve already had that day, instead of looking at an actual number. I’d end up tracking the backlog and notice I went way over or under on my targets.
My idea
What if there was an app that could extract ingredients and macronutrient information a photo of a meal?
It could detect the ingredients present, and estimate the mass or volume of each ingredient present. A user could journal their food in photos, and have a running tally of the nutrition from what they ate on any given day.
It wouldn’t be a full blown nutrition tracking app - you’d just use it as a utility to help you track nutrition more efficiently.
I went ahead and built everything out for a minimum viable product, which is what you see in this repo. Enter Macroscanner.
Macroscanner’s Tech Stack
- 🎨 Frontend: Next.js 15, using this template.
- ⚙️ Backend: Supabase
- đź’ł Payments: Stripe
- 🤖 AI: OpenAI API, GPT-4o
- 🍏 Nutrition Data: USDA FoodData Central
Additionally, I created an ETL notebook using Python to download data from the USDA’s FoodData Central (FDC), load it into a single table, and calculate densities for each item that has a volume measure on it. That code is publicly available under the same license in this repository.
How Macroscanner works
Below is a diagram of the food prediction pipeline:

Prediction Process
Below is the core business logic for Macroscanner. The rest of the app is a fairly straightforward CRUD app in Next.js.
- A user uploads photos of their meal.
- The photos are sent to the OpenAI API, which returns a JSON structure of ingredients present in the photo along with quantity estimations.
- Each food item gets looked up using a hybrid search (full-text search + vector cosine similarity) in my USDA FoodData Central table.
- Several choices are returned from the database. GPT is re-prompted with the photos and choices, and picks the most relevant option.
- Steps 3 and 4 are repeated until all foods are predicted.
- Finally, all items are looked up in the FDC database, and macronutrients are calculated based on the per 100g values in the database and the quantity estimation from the AI.
Why do we re-prompt?
Macroscanner uses a re-prompting approach to improve accuracy.

Sometimes, the state that a food is in can drastically affect its calories. For example, 100g of cooked chicken is around 165 calories, while the same weight of raw chicken is around 120 calories, which is a 37% difference in calories for the same weight of food.
We want to ensure that we are picking the correct state of food from the database, so we can actually re-prompt GPT with the photo and choices, and tell it to return back the most relevant option.
My AI choices
I used a vision-enabled LLM (GPT-4o) as the primary “intelligence” of the app because, just in my testing, they generalize fairly well for the task of categorizing food and have minimal effort to get up to speed with. LLMs probably don’t estimate quantity accurately, but (research needed) I would hypothesize that LLM quantity estimations are likely just as good as human estimations, especially if the human is making a mental estimation after the fact.
Had I trained my own model (which I considered doing)…
- it would have taken a long time to gather data, train, test, and refine it to be good enough for production use, and
- the resulting model could be biased toward a predominant genre or culture of food, or not generalize as well.
As mentioned, an LLM is probably not the best quantity estimation model. I had considered an approach using a combination of deep learning models as follows:
- A generic depth estimation model
- A segmentation model fine-tuned on food.

The process would go:
- Generate a depth map of the scene
- Get a segmentation mask for a food item
- Apply that item’s segmentation mask to the depth map to get a depth map of the food item
- From the resulting depth map and mask, generate a point cloud
- Compute the volume of that object
- Multiply the volume of the point cloud by a known density of that food to get the mass of the food.
Then, just do that for every food item in the scene.
I decided against this approach for quantity estimation and just deferred to the LLM. I would likely be chasing only marginal gains for exponentially more time invested, and that proposed quantity estimator is very likely heavily flawed. Also, since each step of that process has some error rate, the overall error rate of the system vastly increases as you move through it.
Why I stopped developing Macroscanner
No market understanding
For starters, I made the cardinal startup mistake of not evaluating the market before investing time and resources into building a product.
After I finished the MVP, I found out that MyFitnessPal (the biggest nutrition tracking app) already has this functionality integrated directly into their food tracker app. Lifesum also has a similar feature.
Additionally, I neglected to do any other research to see if this was a problem other people experienced, or if it was just me.
Small moat
Even if you operate under the assumption that this is a real problem that people face, it won’t be long before every calorie tracker app has a photo journaling feature. It’s a “low hanging fruit” to implement (it’s a prompt and a model with minimal orchestration) and it gives you that “cool” factor of being able to say you have AI in your app.
As a solopreneur, my moat right out of the gate would have been extremely narrow. You’d still have to purchase Macroscanner in addition to another nutrition tracker. I had essentially developed a feature, not a product. And, with consumers already inundated with subscription services, it looked like marketing this thing was a losing battle.
Cost
It didn’t seem like Macroscanner was going to be cost effective either.
I estimated my variable costs to be ~20-30¢ per image. At my usage level (likely higher than most), I’d have to price it at $6 per month just to make a 50% gross profit. I didn’t want to mess with hosting directly on AWS or self-hosting Supabase, so the steep Vercel and Supabase pricing at scale worried me, especially with a 50% gross profit to cover that.
Lessons learned
- Research the market before starting to build a product. A few competitors is fine, but make sure there’s actually a need for the product.
- Start with a product and build features, not the other way around. Feature ≠Product.
- Simplicity is better at first (LLMs vs custom deep learning & computer vision models).
Conclusion
I decided to open source the project to demonstrate an example of how I can solve a unique problem (even if it isn’t a real problem) by utilizing data and AI. Perhaps this repository will inspire someone else to create something similar, or inspire a different use case or architecture in their own application.
On to the next one!