Thursday, March 31, 2011

34 – Performance woes?

There, fixed it for you! That's right: 120 FPS at 800x600 with Irrlicht software rendering at the highest zoom level with Z-levels 0 to 30 included in the rendering pipeline. This is the best result I am willing/I can achieve. I am tired of tweaking the rendering engine and this is the one that will be included in "Stoneage".

But even this performance is not enough. More precisely the performance is more than enough, but I can not consistently reproduce it. I have two main machines I do tests on, with three more weak machines for extra tests.  My first machine is clunky but has very powerful hardware behind it. It is a development machine and it does not have a good graphics card. The above screenshot was taken on this machine (and yes, I did wait a few seconds until the framerate counter showed exactly 120 since it does fluctuate a little; I can't get it up to 128 though). Using software rendering, framerate is very good at all zoom levels. Scrolling is slow at higher zoom levels, but it is good enough and I can't improve it more right now. As said, the GPU is trash on this machine. I will be slow but playable with DirectX, and completely unplayable with OpenGL. That is why it is good to have multiple backends.

The problem is that I can't get the same results on all machines. Performance does not scale linearly with the power of the hardware. On my second machine, a development/gaming machine with a 8800 GT GPU, I get less framerate (but not by much) with software rendering, but I get a lot more with hardware acceleration. Unless I increase zoom level. For some strange reason, my GPU drops in performance a lot once the number of tiles crosses over a yet undetermined threshold. I guess there are to many draw operations. And since the CPU is weaker, scrolling is even slower at highest zoom level.

So the simple solution of software rendering if you do not have a good GPU, hardware otherwise, becomes a lot more complicated. And even worse, there are computers out there that judging by the numbers should run good in software mode but in practice they don't. This is why developing for the PC is so hard. There is a huge number of hardware combinations that yield unexpected results and this without counting legacy systems.

As much as I would like to create something that works perfectly on everything, I am afraid it is impossible. There will always be people that will have a less than smooth experience.

These people will have to do one of the following:
  • play at low resolutions, probably in windowed mode
  • try every single backend combination. I'll add an autodetect option, but it will not be fail proof so some manual try and error will be needed
  • use the new option for selecting the Z level ranges
I think it is obvious what the new feature from this post is: huge increase in framerate at all zoom levels, but it is only important for high zoom levels. But one feature and a wall of text does not a post make.

My second feature is the ability to select the number of Z levels that are rendered. Before I had two options: render from level 0 to current levels and render from zero to maximum level. So on a normal map, you would render 30 or 40 levels. With the new system you have a number that governs how many levels you get above and below your current level. Let's call it N. So you are rendering from current minus N to current or from current - N to current + N. Default will be N = 5. See how I am cheating? This way you will render 6 or 11 levels. Going by defaults, you will render 6 levels instead of 30. There is good chance that this will have zero impact on the number of tiles you need to render since I have a clever rendering algorithm that only renders as much as needed and I cache everything. But 5 times less Z levels will still requires 5 times less CPU power and make scrolling smoother. This is a fair compromise. People with weaker computers or strong computers that my engine does not like will get a better experience. People with strong computers will adjust N with PageUp/PageDown keys. In the future I'll add a GUI component showing you your current level and how much you are rendering.

I created a video showing things in action: a full on stress test. I have 150 dwarves digging out over 11000 wall at the highest zoom level on my computer that does not like hardware rendering at the highest zoom level. Keeping in mind that FRAPS reduces the framerate, things are pretty smooth but there are a few moments where the game struggles:


  1. what are the specs of the machine you are testing on in the video?

    Also, how much ram usage and cpu usage is it using? (with FRAPS off, of course)

  2. Coming along nicely. Keep up the good work! :)

  3. RAM usage is around 140 MiB and is fairly constant.

    CPU is always at 50%/100% based on how many cores you have, but this value is deceptive. All Irrlicht applications I have seen eat up 100% of the CPU while rendering, even if rendering empty frames. But both the game and the operating system remain very responsive. I have not once experienced a slow down in system, even when the game was trying to do more than it can handle.

    I have tested on machines with something like an Athlon 2000+ if I am not mistaken or something very similar. Anyway a ten year old machine or more. I have tested on some random laptop with 1 GiB RAM and Windows 7 on it, but the laptop seemed to struggle with the basic needs of the operating system. I tested on a netbook, small and thin (and probably cheap as netbooks tend to be). And my two development machines.

    I need to repeat the tests with the new engine, but with the old one I had 30 FPS on the netbook. I want to repeat the stress test there. On other machines I had a lot better results. Right now I think that it is hard to find a machine incapable of at least 50 FPS. Getting over 100 with software rendering should be common.

    Problems start to arise once I increase the zoom level. This test was done on a Intel Core 2 Duo E5xxx (I don't remember, but one of the cheaper models) and a 8800 GT. I get 170-200 FPS at 720p resolution without zoom. But I had to decrease it for the video because FRAPS + zoom did not result in a smooth video. At maximum zoom I get 30-40 FPS on this machine with hardware rendering, while on the other I get 120 with software rendering.

  4. It's so weird that it's faster to bypass the 8800GT... ah well its complicated stuff like that that keeps me from attacking a project like this!

    How is the pathing? If you ran a stress test having a squad of miners dig out a spiral shaped area, would that bog it down?