Why can’t 3DMark benchmarks measure true 3D performance and thermal throttling?

3DMark

Many people, including smartphone manufacturers, use benchmarking applications as an indicator, but in fact, the Play Store distributed version may not be measuring true device performance.

A number of manufacturers loosen thermal controls only during benchmarking

When a new smartphone or SoC is introduced, smartphone geeks and manufacturers alike try to measure the scores with AnTuTu Benchmark, 3DMark, Geekbench, and so on.

Unfortunately, however, there is no end to the number of manufacturers who adjust the scores to improve only when the benchmark application is launched, so that fair comparisons are no longer possible.

What is even more troubling is that many reviewers are unaware of the manufacturer’s manipulation, so they analyze the device’s performance and heat characteristics based on results that differ from actual usage conditions, sometimes leading to incorrect conclusions.

Some reviewers even go so far as to say that “the era of benchmark comparisons is over” without blaming the manufacturer, even though the benchmark application itself is not behaving strangely.

This article will show you how to make sure that benchmark measurements are not just a numbers game.

Can be avoided by using different package names

In the past 70+ units I have used and seen actual benchmark results, the following manufacturers have been identified as changing CPU/GPU clocks and thermal controls only during benchmarking.

Android smartphone Benchmark Charts; CPU, GPU, memory & storage performance, and touch latency – AndroPlus
Here’s my benchmark results for each SoC and device. List of Benchmark Results AnTuTu benchmark is a popular Android benchmark app, but often falls victim…
Android smartphone Benchmark Charts; CPU, GPU, memory & storage performance, and touch latency - AndroPlus

  • Black Shark
  • Infinix
  • Meizu
  • realme (Snapdragon)
  • realme / OnePlus (MediaTek)
  • REDMAGIC
  • vivo
  • Xiaomi

…Almost all of the Chinese manufacturers are boosting.

The ASUS ROG Phone 6 automatically turns on X mode (performance mode) and notifies you when a benchmark app is launched.

X mode changes the behavior of thermal throttling, which is different from the behavior of normal apps, but it is notified in advance and can be turned off manually.

As for realme, realme GT Neo 3 had a boost that fixed the CPU clock to the upper limit, but as soon as I reported it to UL, it stopped boosting. I heard that they continue boosting on other realme devices, so I guess they are not sorry.

 

By detecting that “the package name of the running app is a benchmark app,” the OS from the above manufacturer can fix the CPU clock to the upper limit or adjust it so that thermal throttling does not occur.

Conversely, “if the package name is different from the benchmark app, it will be treated the same as a normal app”.

In Android, the package name can be easily changed by decompiling the APK, so if you have an APK that differs only in the package name, you will get true benchmark results that have not been tampered with by the manufacturer.

I uploaded modded APKs of

  • Geekbench
  • 3DMark
  • PCMark

to here.

Geekbench is a version disguised as Genshin published by the developer.

 

  1. Install the version distributed in the Play Store
  2. Install the modded version of the above
  3. First, measure the score with the modded version
  4. Let the battery temperature cool down to the same level as 3. and measure the score with the Play Store distributed version

If the score of the Play Store distributed version is clearly higher, there is a high possibility of benchmark boost throttling.

 

Check the CPU clock using an app that can display the CPU clock in real time with an overlay, such as Cpu Float, and if it remains stuck at or easily reaches the maximum when you open the Play Store distribution version, then maybe they are boosting, not throttling!

If the overlay display disappears, turn on “Allow screen overlays on Settings” in the developer options. The overlay disappears when the boost function is activated, so it is almost certain that they are boosting when you need the setting.

Also, if 3DMark Wild Life Extreme Stress Test can be completed with the modded package but ends in the middle with the Play Store distributed version, or if the heat generation is quite high, they are running a type of benchmark boost that loosens thermal control.

Differences in results with and without benchmark boost

Now let’s actually see what difference the Play Store distributed version and the package renamed version make on a device with benchmark boosting.

Let’s start with the vivo X90 Pro+ with Snapdragon 8 Gen 2.

The Wild Life Extreme Stress Test in 3DMark with the modded package name on the left side of the image showed a score of 3741 to 2436, with a temperature increase from 23°C to 37°C (14°C increase) and battery consumption of 11%.

The Play Store version on the right side of the image has a stability of 95.4%, a maximum temperature of 49°C, and a consumption of 16%, which is a far cry from the modded version (= treated the same as a normal application).

 

Looking at the results of the Play Store version alone, one might reach the erroneous conclusion that “vivo delivers high 3D performance in a sustained manner, but generates more heat and consumes more battery power”.

In fact, after 30 minutes of playing Genshin in the highest quality, the results show an average of 60.0 FPS with a power consumption of 77.61 mW per FPS and a maximum battery temperature of about 33°C. The results are similar to the modded version of 3DMark that “maintains sufficient performance for game play, while keeping heat generation low”.

It is clear that it is pointless to analyze the 3D performance and heat generation characteristics of a device with the Play Store distributed version.

3DMark

Next is the Xiaomi MIX Fold 2.

In the case of the Xiaomi MIX Fold 2, the heat control was loosened, resulting in the device overheating to nearly 50ºC, which was judged as overheating and the benchmark ended in the middle of the test.

This is a really stupid boosting method that is supposed to increase the benchmark score, but ends up recording nothing.

The modded version completed the benchmark without any problems, with a maximum temperature of 42°C.

パッケージ名変更版

Why is it bad to adjust only during benchmarking?

UL Solutions, the provider of 3DMark and PCMark, clearly outlines the prohibitions during benchmarking:

  • The platform may not change the quality level of the work.
  • The platform may not use an alternative technique to that requested by the workload.
  • The platform may not replace or remove any portion of the requested work even if the change would result in the same output.
  • Optimizations based on empirical data of benchmark workloads are not allowed.
  • Optimizations that change the output of the work are not allowed.

While some manufacturers make excuses such as, “It’s a gaming smartphone, so it has to deliver maximum performance results,” or “Other manufacturers are doing it, too,” this is not even an argument since it is a violation of the terms and conditions.

UL Solutions’ Benchmark Rules
People rely on our benchmarks for accurate, impartial results. We safeguard that trust with clear rules for hardware manufacturers and software developers.…

Apart from the so-called “benchmark boost” that makes the numbers look good, there is also “throttling” behavior that limits CPU/GPU operation of game applications, etc. by identifying them by package name, even though the benchmark results are seemingly normal.

I do not consider “throttling” to be a major problem if there is a workaround for it, since it is normal for heat generation to increase as performance improves, but “benchmark boosting” is a malicious behavior that renders benchmarks meaningless, making fair comparisons with other devices impossible.

Benchmarks are not there to measure the theoretical maximum performance of a smartphone, but to compare its performance under the same conditions as when using other applications.

While some people say “just compare on the assumption that it is boosted,” what is the point of comparing when the criteria are absurd…

 

It would be nice if manufacturers would stop the ridiculous benchmark boosting, but as long as there are people who are happy or sad with AnTuTu Benchmark numbers, the policy of boosting will never change.

Even if the general public can’t help being fooled, I hope that reviewers on social networking sites, blogs, commercial media, etc. will pay attention to whether the benchmark results they take are rigged or not, and whether they are really useful data, and analyze and criticize based on accurate data.

As the saying goes, “Numbers don’t lie, but liars use numbers”.

Android smartphone Benchmark Charts; CPU, GPU, memory & storage performance, and touch latency – AndroPlus
Here’s my benchmark results for each SoC and device. List of Benchmark Results AnTuTu benchmark is a popular Android benchmark app, but often falls victim…
Android smartphone Benchmark Charts; CPU, GPU, memory & storage performance, and touch latency - AndroPlus

Pocket Mastodon