ChatGPT Calorie Counting Accuracy: 2026 Research Review
What four peer-reviewed 2025 studies found when they tested ChatGPT-4, ChatGPT-5, Claude, and Gemini on real meal photos — energy errors of 14 to 36 percent, why portion size and protein break the models, and a five-step AU workflow for using ChatGPT calorie estimates inside an 8,700 kJ day.
Dr. Maya Patel
Registered Dietitian, M.S. Nutrition Science

ChatGPT identifies foods in meal photographs with 93 percent precision but estimates total energy with a 26 to 36 percent mean absolute percentage error, according to four peer-reviewed 2025 studies. ChatGPT-5 cuts the error to 13.9 percent when you supply detailed ingredients with the image, and to 30.5 percent on image-only prompts. Use it to rank meals quickly inside an 8,700 kJ AU day, and weigh the calorie-dense items for tight targets.
The publicly available research that has stress-tested ChatGPT and related multimodal large language models (MLLMs) on real meal photographs converges on a single pattern: food identification is now strong, but absolute kilojoule and macronutrient quantification still drifts more than most users expect. This guide walks through the peer-reviewed evidence for ChatGPT-4, ChatGPT-5, Claude 3.5 Sonnet, Gemini 1.5 Pro, and fine-tuned open models, side by side, then maps the findings to a practical Australian calorie-tracking workflow built around the FSANZ Australian Food Composition Database (AFCD).
The studies below come from Nutrients (O'Hara and colleagues 2025; Rodríguez-Jiménez and colleagues 2025; Tanabe and Yanai 2025) and Current Developments in Nutrition (Fridolfsson and colleagues 2025), evaluating between 52 and 195 standardised meal photographs across cuisines and portion sizes. Where the labelled and AI-estimated numbers diverge, you may want to lean on weighing for the calorie-dense items and the AI for the lower-stakes calls.
How accurate is ChatGPT at counting calories from a meal photo?
ChatGPT-4 estimated energy with a 35.8 percent mean absolute percentage error across 52 standardised meal photographs in a 2025 University of Gothenburg study, and a 26.9 percent average absolute difference across 114 Irish meal photos in O'Hara and colleagues (2025). ChatGPT-5 cut the calorie error to 30.5 percent on image-only prompts and 13.9 percent when detailed ingredients were supplied (Rodríguez-Jiménez and colleagues 2025).
Fridolfsson and colleagues at the University of Gothenburg tested ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on 52 standardised photographs covering 16 single foods and 36 mixed meals across three portion sizes. ChatGPT and Claude both landed at 35.8 percent MAPE for energy. Gemini ran 64.2 percent. The team published the work in Current Developments in Nutrition in September 2025.
O'Hara and colleagues in Nutrients analysed 114 meal photographs from Ireland's National Adult Nutrition Survey. ChatGPT-4 identified foods well (93 percent precision, 84.6 percent recall, 88.6 F1 score) and the average energy difference was nearly zero — the model under- and over-counted in roughly balanced amounts. The average absolute percentage difference, however, was 26.9 percent across 16 nutrients, and 13 of those 16 nutrients showed more than 10 percent error. ChatGPT-4 also underestimated weight in 76.3 percent of photos.
Rodríguez-Jiménez and colleagues tested ChatGPT-5 — the model that ships inside the consumer ChatGPT app at the time of writing — across 195 dishes from three sources (Allrecipes.com, the SNAPMe research dataset, and dietitian-prepared home meals). The four prompt scenarios they ran tell the story most users do not see.
| Prompt scenario | MAE (kcal) | MAPE | RMSE (kcal) |
| Image only | 123 | 30.5 percent | 164 |
| Image + non-visual data | 92 | 24.4 percent | 122 |
| Image + detailed ingredients | 53 | 13.9 percent | 81 |
| Ingredients only (no image) | 67 | 18.1 percent | 101 |
For a 2,500 kJ (600 kcal) dinner, a 26 to 36 percent error band lands between 1,850 and 3,400 kJ — roughly the gap between a single Calorie Smart HelloFresh stir-fry and a Family Classic parmigiana. The HelloFresh AU recipe database guide covers that gap in detail. For tight tracking, this is why the AI vs manual calorie tracking comparison still puts weighing ahead of photo-only logging on accuracy.
Why does ChatGPT struggle most with portion size and protein?
ChatGPT and other multimodal LLMs struggle most with portion size because the model has no depth-of-field, plate-diameter, or weight reference in a single photo, so it falls back to averaged training-data portions. Protein errors run 60 to 110 percent because the dense, low-volume foods that carry most protein (meat, eggs, dairy) are exactly where small visual misreads produce the largest macronutrient swings.
The Tanabe and Yanai 2025 Nutrients paper from the University of Electro-Communications in Tokyo isolated portion size as the largest single error source. Working on the Nutrition5k dataset (3,265 annotated food images, 2,759 training and 506 test), they showed that bolting a volume-estimation module onto GPT-4o cut the calorie MAE from a baseline of around 78.8 kcal to lower figures, with the best fine-tuned LLaVA-1.5-13B reaching 64.3 kcal MAE and a 0.934 correlation against ground truth. The takeaway: vision is fine; volume is the bottleneck.
Fridolfsson and colleagues quantified protein specifically. Across their 52 photos, MAPE values for protein ran from 60.7 percent (ChatGPT) to 109.9 percent (Gemini). Carbohydrate errors ran 47.9 percent (ChatGPT) to 72.8 percent (Claude). Fat errors ran 41.7 percent (Claude) to 89.6 percent (Gemini). Energy was the most reliable single metric because per-gram macronutrient errors tend to partly cancel out in the kcal total.
Here is how the three leading consumer models compared head-to-head in that study.
| Nutrient | ChatGPT-4o | Claude 3.5 Sonnet | Gemini 1.5 Pro |
| Energy (kcal) | 35.8 percent MAPE | 35.8 percent MAPE | 64.2 percent MAPE |
| Protein (g) | 60.7 percent | ~70 percent | 109.9 percent |
| Carbohydrate (g) | 47.9 percent | 72.8 percent | ~80 percent |
| Fat (g) | ~50 percent | 41.7 percent | 89.6 percent |
| Weight (g) | 36.3 percent | 37.3 percent | ~70 percent |
The mechanism behind portion drift is the same one that breaks any single-photo estimate: there is no reference scale in the frame. A 150 g chicken breast and a 250 g chicken breast can look almost identical at the camera angle most users shoot from. If you want to test this for yourself, put a known item (a closed water bottle, a credit card) in the photo and ask the model to use it as a size reference — accuracy typically improves by a measurable amount.
How does ChatGPT compare to dedicated calorie tracking apps?
Across the published research, ChatGPT-4o and Claude 3.5 Sonnet land in roughly the same accuracy band as the AI-enabled calorie apps tested in the University of Sydney 2024 review of 18 food trackers — 35 to 40 percent energy error for mixed meals. Dedicated apps win on barcode lookup, database integration, and tracking workflow; ChatGPT wins on conversational input, free-text portion refinement, and not requiring a separate app install.
The University of Sydney research suggests that AI-enabled food trackers overestimated beef pho calories by 49 percent and underestimated bubble tea by up to 76 percent — within the same broad accuracy band the LLM studies report. Manual-entry apps in that same Sydney review drifted 944 to 1,520 kJ per day against reference meal plans across Western, Asian, and Australian dietary guideline patterns.
A fair direct comparison of strengths and weaknesses for an everyday calorie-tracking goal:
| Workflow | Energy accuracy | Speed per entry | Best for |
| Weighing + manual entry (AFCD-backed app) | Highest (5 to 15 percent error with care) | 60 to 120 seconds | Tight deficits, last-stage fat loss |
| AI photo logging (in-app, MyFitnessPal-class) | Moderate (~35 to 50 percent error) | 5 to 10 seconds | Habit formation, busy weekdays |
| ChatGPT image-only | Moderate (~30 to 36 percent error) | 10 to 20 seconds | Ad-hoc estimates, eating out |
| ChatGPT image + ingredient context | Higher (~14 percent error) | 30 to 60 seconds | Home-cooked meals you remember the ingredients of |
| Dietitian visual estimation | Comparable to AI (Intraclass correlation 0.31 to 0.67) | 30 to 60 seconds | Clinical work |
Can adding context to your ChatGPT prompt improve accuracy?
Yes — Rodríguez-Jiménez and colleagues (2025) showed that supplementing a meal photograph with a detailed ingredient list cut the kcal mean absolute error from 123 kcal (image only) to 53 kcal (image plus ingredients) — a 57 percent reduction. Even adding rough portion estimates ("about a fist-sized serve of rice") helped meaningfully. The visual still pulled its weight; image-plus-ingredients beat ingredients alone by 13 kcal MAE on average.
This is the single most useful practical finding in the current literature for everyday users. ChatGPT's image-only error is not what you have to live with. The model takes context well, and the more concrete the context, the better the estimate. Where most general-purpose ChatGPT calorie queries fail is not the AI model — it is the user prompt skipping the cues that the model is built to weigh.
A workable five-step prompt routine for an Australian user:
The TDEE calculator AU and US units guide shows the formulas that anchor those numbers to your personal maintenance figure rather than the generic 8,700 kJ menu-label reference.
How should Australian users use ChatGPT for calorie counting in 2026?
Use ChatGPT as a secondary estimation tool, anchored against an AFCD-backed primary tracker. The Australian Food Composition Database currently lists 1,588 foods with up to 268 nutrients per food, with most values directly analysed against Australian produce, cultivars, and supply chains. ChatGPT's training data is heavily US-weighted, so always verify the AU staples (Tim Tams, Vegemite, Weet-Bix, lamington, Sao biscuits, AU cuts of meat) against AFCD before trusting the model's number.
The Australian-specific gap is bigger than most users realise. ChatGPT learned from publicly scraped web text and image data, the majority of which is United States–weighted. A Granny Smith apple grown in Batlow, a bunch of Australian baby spinach, and a kangaroo loin cut are all distinct from the closest USDA reference items the model is most likely to recall. The NUTTAB vs USDA tracking accuracy guide walks through the database mismatch and why it matters more for Australian users than people often assume.
For an Australian calorie-tracking plan, here is a practical division of labour:
The getting started with calorie tracking guide covers the first-two-weeks routine for setting up an AU-anchored primary tracker, and the calorie tracking versus intuitive eating discussion is the right follow-on if you are weighing whether tracking is still useful at your current stage.
A practical caveat. ChatGPT-5 was released in August 2025 and is still being benchmarked across different food categories. Expect the published accuracy figures above to shift as researchers retest, and as the next model generation lands. The shape of the problem — strong identification, weaker volume estimation, large protein errors — is unlikely to flip overnight.
Frequently Asked Questions
Can ChatGPT count calories from a photo accurately?
ChatGPT counts calories with a 26 to 36 percent mean absolute percentage error from image-only prompts, according to four peer-reviewed 2025 studies. Accuracy improves to roughly 14 percent error when you supply detailed ingredient context with the photo. Food identification is strong (93 percent precision), but portion size and protein estimation are the largest error sources. The model under-predicts portion weight in roughly 76 percent of medium and large meals.
Is ChatGPT-5 better than ChatGPT-4 for calorie estimation?
Yes, modestly. The Rodríguez-Jiménez 2025 study put ChatGPT-5 at 30.5 percent MAPE for image-only kcal estimation, compared to roughly 35.8 percent for ChatGPT-4 in the Fridolfsson 2025 study. The bigger jump comes from prompt engineering: ChatGPT-5 with detailed ingredient context lands at 13.9 percent MAPE, less than half the image-only error. The model has improved, but the prompting matters more than the version number for everyday use.
How does ChatGPT compare to MyFitnessPal for tracking calories?
ChatGPT and MyFitnessPal sit in different categories. MyFitnessPal is a logging app with a barcode scanner, recipe builder, and a crowdsourced database of around six million entries; ChatGPT is a conversational estimator with no built-in food database or tracking workflow. For energy accuracy, MyFitnessPal's image-correlation against analysed reference data was r equals 0.96 in the 2020 Evenepoel validation, while ChatGPT-4o image-only sits at 35.8 percent MAPE per the 2025 Fridolfsson study. Most accurate trackers use both: app for logging, AI for ad-hoc estimates.
Why does ChatGPT get protein and portion size wrong?
ChatGPT struggles with portion size because a single photo has no depth-of-field, plate-diameter, or weight reference for the model to anchor to, so it falls back to averaged training-data portions. Protein errors run 60 to 110 percent across leading multimodal LLMs (per Fridolfsson 2025) because dense, low-volume foods like chicken, eggs, and cheese carry most of the protein, and small visual misreads on those items produce large macronutrient swings.
Does ChatGPT work well for Australian foods specifically?
Less well than for US foods, based on the underlying training data. ChatGPT learned from publicly scraped web content that is heavily United States–weighted, so AU-specific items (Vegemite, Tim Tams, AU cuts of meat, AU-grown produce) often get matched against the closest USDA reference rather than the FSANZ Australian Food Composition Database (AFCD). For tight tracking, cross-check ChatGPT's estimate against AFCD, which lists 1,588 AU-available foods with up to 268 nutrients per food, mostly directly analysed.
Should I use ChatGPT instead of a dedicated calorie tracking app?
Not as a primary tool. Research suggests ChatGPT works best as a secondary estimator alongside a dedicated tracking app anchored to a reliable food composition database. Use a primary tracker for daily logging and consistency. Use ChatGPT for restaurant meals, unfamiliar dishes, and quick second opinions — with the understanding that the energy estimate carries a 14 to 36 percent error band depending on how much ingredient context you supply.
Sources
Ready to track smarter?
Join thousands who use KCALM for calorie tracking. AI-powered food recognition, scientifically-validated calculations, and zero anxiety.
Related Articles
GLP-1 Calorie Targets 2026: Mounjaro, Ozempic AU Guide
Evidence-based calorie and protein targets for adults on tirzepatide (Mounjaro) or semaglutide (Ozempic, Wegovy) in Australia 2026, with dose-escalation tables, 8,700 kJ reference math, and how to track without losing lean mass.
ScienceTDEE Calculator AU + US: The Formulas Apps Get Wrong
Calculate your TDEE in AU (kg, cm, kJ) and US (lb, in, kcal) units, see worked Mifflin-St Jeor and Katch-McArdle examples, and learn the 4 errors most calorie apps make.
ScienceMifflin-St Jeor vs Katch-McArdle: Which TDEE Formula Wins?
Mifflin-St Jeor or Katch-McArdle for your TDEE? Compare the two formulas, see worked examples in AU and US units, accuracy data, and which to use for your body.