Science12 min readJune 10, 2026

ChatGPT Calorie Counting Accuracy: 2026 Research Review

What four peer-reviewed 2025 studies found when they tested ChatGPT-4, ChatGPT-5, Claude, and Gemini on real meal photos — energy errors of 14 to 36 percent, why portion size and protein break the models, and a five-step AU workflow for using ChatGPT calorie estimates inside an 8,700 kJ day.

Dr. Maya Patel

Registered Dietitian, M.S. Nutrition Science

ChatGPT identifies foods in meal photographs with 93 percent precision but estimates total energy with a 26 to 36 percent mean absolute percentage error, according to four peer-reviewed 2025 studies. ChatGPT-5 cuts the error to 13.9 percent when you supply detailed ingredients with the image, and to 30.5 percent on image-only prompts. Use it to rank meals quickly inside an 8,700 kJ AU day, and weigh the calorie-dense items for tight targets.

The publicly available research that has stress-tested ChatGPT and related multimodal large language models (MLLMs) on real meal photographs converges on a single pattern: food identification is now strong, but absolute kilojoule and macronutrient quantification still drifts more than most users expect. This guide walks through the peer-reviewed evidence for ChatGPT-4, ChatGPT-5, Claude 3.5 Sonnet, Gemini 1.5 Pro, and fine-tuned open models, side by side, then maps the findings to a practical Australian calorie-tracking workflow built around the FSANZ Australian Food Composition Database (AFCD).

The studies below come from Nutrients (O'Hara and colleagues 2025; Rodríguez-Jiménez and colleagues 2025; Tanabe and Yanai 2025) and Current Developments in Nutrition (Fridolfsson and colleagues 2025), evaluating between 52 and 195 standardised meal photographs across cuisines and portion sizes. Where the labelled and AI-estimated numbers diverge, you may want to lean on weighing for the calorie-dense items and the AI for the lower-stakes calls.

How accurate is ChatGPT at counting calories from a meal photo?

ChatGPT-4 estimated energy with a 35.8 percent mean absolute percentage error across 52 standardised meal photographs in a 2025 University of Gothenburg study, and a 26.9 percent average absolute difference across 114 Irish meal photos in O'Hara and colleagues (2025). ChatGPT-5 cut the calorie error to 30.5 percent on image-only prompts and 13.9 percent when detailed ingredients were supplied (Rodríguez-Jiménez and colleagues 2025).

Fridolfsson and colleagues at the University of Gothenburg tested ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro on 52 standardised photographs covering 16 single foods and 36 mixed meals across three portion sizes. ChatGPT and Claude both landed at 35.8 percent MAPE for energy. Gemini ran 64.2 percent. The team published the work in Current Developments in Nutrition in September 2025.

O'Hara and colleagues in Nutrients analysed 114 meal photographs from Ireland's National Adult Nutrition Survey. ChatGPT-4 identified foods well (93 percent precision, 84.6 percent recall, 88.6 F1 score) and the average energy difference was nearly zero — the model under- and over-counted in roughly balanced amounts. The average absolute percentage difference, however, was 26.9 percent across 16 nutrients, and 13 of those 16 nutrients showed more than 10 percent error. ChatGPT-4 also underestimated weight in 76.3 percent of photos.

Rodríguez-Jiménez and colleagues tested ChatGPT-5 — the model that ships inside the consumer ChatGPT app at the time of writing — across 195 dishes from three sources (Allrecipes.com, the SNAPMe research dataset, and dietitian-prepared home meals). The four prompt scenarios they ran tell the story most users do not see.

Prompt scenario	MAE (kcal)	MAPE	RMSE (kcal)
Image only	123	30.5 percent	164
Image + non-visual data	92	24.4 percent	122
Image + detailed ingredients	53	13.9 percent	81
Ingredients only (no image)	67	18.1 percent	101

Two practical reads from that table. First, each step toward giving the model more context cut the error meaningfully — adding "a small piece of grilled chicken breast, about 120 grams" instead of just "chicken" can roughly halve the kcal error. Second, dropping the image entirely (ingredients-only) actually performs worse than image-plus-ingredients, which means the visual cues are doing real work beyond what a text-only ingredient list captures.

For a 2,500 kJ (600 kcal) dinner, a 26 to 36 percent error band lands between 1,850 and 3,400 kJ — roughly the gap between a single Calorie Smart HelloFresh stir-fry and a Family Classic parmigiana. The HelloFresh AU recipe database guide covers that gap in detail. For tight tracking, this is why the AI vs manual calorie tracking comparison still puts weighing ahead of photo-only logging on accuracy.

Smartphone running ChatGPT analysing a plated meal of grilled chicken, sweet potato, and steamed broccoli beside a digital kitchen scale on a sunlit Australian kitchen counter

Why does ChatGPT struggle most with portion size and protein?

ChatGPT and other multimodal LLMs struggle most with portion size because the model has no depth-of-field, plate-diameter, or weight reference in a single photo, so it falls back to averaged training-data portions. Protein errors run 60 to 110 percent because the dense, low-volume foods that carry most protein (meat, eggs, dairy) are exactly where small visual misreads produce the largest macronutrient swings.

The Tanabe and Yanai 2025 Nutrients paper from the University of Electro-Communications in Tokyo isolated portion size as the largest single error source. Working on the Nutrition5k dataset (3,265 annotated food images, 2,759 training and 506 test), they showed that bolting a volume-estimation module onto GPT-4o cut the calorie MAE from a baseline of around 78.8 kcal to lower figures, with the best fine-tuned LLaVA-1.5-13B reaching 64.3 kcal MAE and a 0.934 correlation against ground truth. The takeaway: vision is fine; volume is the bottleneck.

Fridolfsson and colleagues quantified protein specifically. Across their 52 photos, MAPE values for protein ran from 60.7 percent (ChatGPT) to 109.9 percent (Gemini). Carbohydrate errors ran 47.9 percent (ChatGPT) to 72.8 percent (Claude). Fat errors ran 41.7 percent (Claude) to 89.6 percent (Gemini). Energy was the most reliable single metric because per-gram macronutrient errors tend to partly cancel out in the kcal total.

Here is how the three leading consumer models compared head-to-head in that study.

Nutrient	ChatGPT-4o	Claude 3.5 Sonnet	Gemini 1.5 Pro
Energy (kcal)	35.8 percent MAPE	35.8 percent MAPE	64.2 percent MAPE
Protein (g)	60.7 percent	~70 percent	109.9 percent
Carbohydrate (g)	47.9 percent	72.8 percent	~80 percent
Fat (g)	~50 percent	41.7 percent	89.6 percent
Weight (g)	36.3 percent	37.3 percent	~70 percent

A few patterns worth holding in mind. Claude and ChatGPT track each other very closely on energy and weight — both are realistic options inside the 30 to 40 percent error band. Gemini lags meaningfully on every metric in that study. And every model under-predicts as portion size grows, which is the same pattern seen in the O'Hara study (76 percent underestimation for medium and large meals) and in the AI app testing covered in the calorie apps overcount vegetables write-up.

The mechanism behind portion drift is the same one that breaks any single-photo estimate: there is no reference scale in the frame. A 150 g chicken breast and a 250 g chicken breast can look almost identical at the camera angle most users shoot from. If you want to test this for yourself, put a known item (a closed water bottle, a credit card) in the photo and ask the model to use it as a size reference — accuracy typically improves by a measurable amount.

How does ChatGPT compare to dedicated calorie tracking apps?

Across the published research, ChatGPT-4o and Claude 3.5 Sonnet land in roughly the same accuracy band as the AI-enabled calorie apps tested in the University of Sydney 2024 review of 18 food trackers — 35 to 40 percent energy error for mixed meals. Dedicated apps win on barcode lookup, database integration, and tracking workflow; ChatGPT wins on conversational input, free-text portion refinement, and not requiring a separate app install.

The University of Sydney research suggests that AI-enabled food trackers overestimated beef pho calories by 49 percent and underestimated bubble tea by up to 76 percent — within the same broad accuracy band the LLM studies report. Manual-entry apps in that same Sydney review drifted 944 to 1,520 kJ per day against reference meal plans across Western, Asian, and Australian dietary guideline patterns.

A fair direct comparison of strengths and weaknesses for an everyday calorie-tracking goal:

Workflow	Energy accuracy	Speed per entry	Best for
Weighing + manual entry (AFCD-backed app)	Highest (5 to 15 percent error with care)	60 to 120 seconds	Tight deficits, last-stage fat loss
AI photo logging (in-app, MyFitnessPal-class)	Moderate (~35 to 50 percent error)	5 to 10 seconds	Habit formation, busy weekdays
ChatGPT image-only	Moderate (~30 to 36 percent error)	10 to 20 seconds	Ad-hoc estimates, eating out
ChatGPT image + ingredient context	Higher (~14 percent error)	30 to 60 seconds	Home-cooked meals you remember the ingredients of
Dietitian visual estimation	Comparable to AI (Intraclass correlation 0.31 to 0.67)	30 to 60 seconds	Clinical work

The honest read is that ChatGPT is most useful as a second tool, not a primary tracker. The best calorie tracking apps comparison walks through what a primary tracker should look like in 2026, and the how KCALM AI food recognition works explainer covers what a dedicated AI tracker does differently from a general-purpose chatbot — anchored databases, portion priors per food type, and macro back-checks that a chat interface does not run by default.

Can adding context to your ChatGPT prompt improve accuracy?

Yes — Rodríguez-Jiménez and colleagues (2025) showed that supplementing a meal photograph with a detailed ingredient list cut the kcal mean absolute error from 123 kcal (image only) to 53 kcal (image plus ingredients) — a 57 percent reduction. Even adding rough portion estimates ("about a fist-sized serve of rice") helped meaningfully. The visual still pulled its weight; image-plus-ingredients beat ingredients alone by 13 kcal MAE on average.

This is the single most useful practical finding in the current literature for everyday users. ChatGPT's image-only error is not what you have to live with. The model takes context well, and the more concrete the context, the better the estimate. Where most general-purpose ChatGPT calorie queries fail is not the AI model — it is the user prompt skipping the cues that the model is built to weigh.

A workable five-step prompt routine for an Australian user:

Photograph the plate from above in good light, with a reference item (a coffee mug, a 750 mL water bottle, your phone case) visible at the same depth.

Type the major ingredients and rough quantities alongside the photo. "Roast chicken thigh ~150 g, sweet potato ~200 g, broccoli ~120 g, olive oil ~10 g" beats "chicken and veg" by a wide margin.

Specify the cooking method — pan-fried, air-fried, steamed, raw. Oil absorption is one of the biggest hidden variables, exactly as covered in the calorie apps overcount vegetables guide.

Ask for kilojoules and kilocalories in AU units, since the model defaults to US kcal and may under-report Australian portion conventions. "Give me kJ and kcal, plus protein, carbs, fat to one decimal place."

Cross-check the energy total against the macros using the Atwater system (4-4-9). If protein × 17 + carbs × 17 + fat × 37 does not land within 100 kJ of the model's stated total, ask the model to redo the math.

The TDEE calculator AU and US units guide shows the formulas that anchor those numbers to your personal maintenance figure rather than the generic 8,700 kJ menu-label reference.

Open notebook beside a smartphone showing ChatGPT calorie analysis for a plated meal and a kitchen scale weighing roast chicken on a sunlit Australian kitchen counter

How should Australian users use ChatGPT for calorie counting in 2026?

Use ChatGPT as a secondary estimation tool, anchored against an AFCD-backed primary tracker. The Australian Food Composition Database currently lists 1,588 foods with up to 268 nutrients per food, with most values directly analysed against Australian produce, cultivars, and supply chains. ChatGPT's training data is heavily US-weighted, so always verify the AU staples (Tim Tams, Vegemite, Weet-Bix, lamington, Sao biscuits, AU cuts of meat) against AFCD before trusting the model's number.

The Australian-specific gap is bigger than most users realise. ChatGPT learned from publicly scraped web text and image data, the majority of which is United States–weighted. A Granny Smith apple grown in Batlow, a bunch of Australian baby spinach, and a kangaroo loin cut are all distinct from the closest USDA reference items the model is most likely to recall. The NUTTAB vs USDA tracking accuracy guide walks through the database mismatch and why it matters more for Australian users than people often assume.

For an Australian calorie-tracking plan, here is a practical division of labour:

Primary tracker: an app that uses an AFCD-derived database (or pre-registered custom entries you have weighed against the FSANZ Nutrition Panel Calculator). This is your daily ledger.

ChatGPT for eating out: photograph + ingredient context for restaurant meals where the chain has not published kilojoule figures. Expect a 30 percent error band and budget accordingly.

ChatGPT for unfamiliar dishes: stir-fries, mixed bowls, weekend social meals where weighing is socially awkward. Cross-check against the closest AFCD entry afterwards.

Weighing for the calorie-dense items: rice, pasta, oils, nuts, and meats — these are the items where a 30 percent error translates to 400 kJ or more.

Skip ChatGPT for protein targets: the 60 percent protein error band in the Fridolfsson study is too wide for anyone tracking a 1.6 g/kg/day protein floor. Use weighed portions for the protein source.

The getting started with calorie tracking guide covers the first-two-weeks routine for setting up an AU-anchored primary tracker, and the calorie tracking versus intuitive eating discussion is the right follow-on if you are weighing whether tracking is still useful at your current stage.

A practical caveat. ChatGPT-5 was released in August 2025 and is still being benchmarked across different food categories. Expect the published accuracy figures above to shift as researchers retest, and as the next model generation lands. The shape of the problem — strong identification, weaker volume estimation, large protein errors — is unlikely to flip overnight.

Frequently Asked Questions

Can ChatGPT count calories from a photo accurately?

ChatGPT counts calories with a 26 to 36 percent mean absolute percentage error from image-only prompts, according to four peer-reviewed 2025 studies. Accuracy improves to roughly 14 percent error when you supply detailed ingredient context with the photo. Food identification is strong (93 percent precision), but portion size and protein estimation are the largest error sources. The model under-predicts portion weight in roughly 76 percent of medium and large meals.

Is ChatGPT-5 better than ChatGPT-4 for calorie estimation?

Yes, modestly. The Rodríguez-Jiménez 2025 study put ChatGPT-5 at 30.5 percent MAPE for image-only kcal estimation, compared to roughly 35.8 percent for ChatGPT-4 in the Fridolfsson 2025 study. The bigger jump comes from prompt engineering: ChatGPT-5 with detailed ingredient context lands at 13.9 percent MAPE, less than half the image-only error. The model has improved, but the prompting matters more than the version number for everyday use.

How does ChatGPT compare to MyFitnessPal for tracking calories?

ChatGPT and MyFitnessPal sit in different categories. MyFitnessPal is a logging app with a barcode scanner, recipe builder, and a crowdsourced database of around six million entries; ChatGPT is a conversational estimator with no built-in food database or tracking workflow. For energy accuracy, MyFitnessPal's image-correlation against analysed reference data was r equals 0.96 in the 2020 Evenepoel validation, while ChatGPT-4o image-only sits at 35.8 percent MAPE per the 2025 Fridolfsson study. Most accurate trackers use both: app for logging, AI for ad-hoc estimates.

Why does ChatGPT get protein and portion size wrong?

ChatGPT struggles with portion size because a single photo has no depth-of-field, plate-diameter, or weight reference for the model to anchor to, so it falls back to averaged training-data portions. Protein errors run 60 to 110 percent across leading multimodal LLMs (per Fridolfsson 2025) because dense, low-volume foods like chicken, eggs, and cheese carry most of the protein, and small visual misreads on those items produce large macronutrient swings.

Does ChatGPT work well for Australian foods specifically?

Less well than for US foods, based on the underlying training data. ChatGPT learned from publicly scraped web content that is heavily United States–weighted, so AU-specific items (Vegemite, Tim Tams, AU cuts of meat, AU-grown produce) often get matched against the closest USDA reference rather than the FSANZ Australian Food Composition Database (AFCD). For tight tracking, cross-check ChatGPT's estimate against AFCD, which lists 1,588 AU-available foods with up to 268 nutrients per food, mostly directly analysed.

Should I use ChatGPT instead of a dedicated calorie tracking app?

Not as a primary tool. Research suggests ChatGPT works best as a secondary estimator alongside a dedicated tracking app anchored to a reliable food composition database. Use a primary tracker for daily logging and consistency. Use ChatGPT for restaurant meals, unfamiliar dishes, and quick second opinions — with the understanding that the energy estimate carries a 14 to 36 percent error band depending on how much ingredient context you supply.

Sources

O'Hara C, Conway MC, Walton J. An Evaluation of ChatGPT for Nutrient Content Estimation from Meal Photographs. Nutrients. 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC11858203/

Rodríguez-Jiménez et al. Image-Based Dietary Energy and Macronutrients Estimation with ChatGPT-5: Cross-Source Evaluation Across Escalating Context Scenarios. Nutrients. 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12655113/

Fridolfsson J, Sjöberg E, Thiwång M, Pettersson S. Performance Evaluation of 3 Large Language Models for Nutritional Content Estimation from Food Images. Current Developments in Nutrition. 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC12513282/

Tanabe H, Yanai K. Reasoning-Driven Food Energy Estimation via Multimodal Large Language Models. Nutrients. 2025. https://pmc.ncbi.nlm.nih.gov/articles/PMC11990770/

Food Standards Australia New Zealand. Australian Food Composition Database (AFCD). https://www.foodstandards.gov.au/science-data/monitoringnutrients/afcd

Ready to track smarter?

Join thousands who use KCALM for calorie tracking. AI-powered food recognition, scientifically-validated calculations, and zero anxiety.

Download Free on iOS100 AI analyses free, no credit card required

Science