Rudimentary Code Profiling how-to (for performance optimization).

hansbe

  • Posts: 262
The content of this post shows how to do some rudimentary profiling, that is, how to figure out how much time a piece of code takes. The basic idea presented here is to use a stopwatch method:
  • Measure the current system time (tic)
  • Execute code under test
  • Measure the current system time (toc)
  • Present the measurement result (toc-tic)

I only use stencyl for flash, but if anyone posts a method for iOS, I can update this post.

Flash AS3 method
The basic idea is to use the flash.utils.getTimer() method which reports time since a flash application started in milliseconds. Unfortunately I know of no more accurate timer in flash, therefore it will be necessary to wrap most bits of code in a loop to test it.

    Step 1 : Put this in a code block at the beginning of the behavior/script/class:
    [/list]
    Code: [Select]
    import flash.utils.getTimer;
    var tic,toc : int;
    Step 2 : Do a tic measurement and wrap the code you want to test in a loop.
    Code: [Select]
    tic = flash.utils.getTimer();
    for(var index0:int = 0; index0 < 10000; index0++)
    {
    Step 3 : End the loop, do a toc measurement and print.
    Code: [Select]
    }
    toc = flash.utils.getTimer();
    print("[loop] time taken:"+(toc-tic));
    Interpreting results:
    • Results vary linearly from CPU to CPU based on operating frequency.
    • Results can vary a LOT from CPU to CPU based on architecture (pipelining, branch prediction).
    • If the code is 'memory intensive' the size of the cache is important, as well as the speed of memory retrievial itself.
    • GPU capabilities and similar can also affect results a lot, for rendering, etc.

    Flash AS3 Examples:

    My laptop CPU runs at 2.26 GHz
    An empty loop with 10 million iterations took about 43 ms on average. Indicating that each iteration took on average: (0.043/10000000)*2260000000 ~= 9.73 cycles.
    (This isn't entirely accurate of course. The most important source of error is that the code may be interrupted by other tasks by the operating system.)

    A loop containing 'foo=foo+1;' with 10 million iterations took about 1018 ms on average. Indicating that each iteration took on average: (1.018/10M)*2260M = 230 cycles.
    A loop containing 100 'foo=foo+1;' with 100k iterations took about 430 ms on average. Indicating that each iteration took on average: (0.430/0.1M)*2260M = 9718 cycles.
    However each instruction took on average: 9718/100 ~= 9.7 cycles.

    This performance is pretty nasty (bad) and I the issue probably with pipelining. E.g. for the foo=foo+1; instruction my cpu must wait until foo in the previous instruction is calculated before starting on the next instruction.

    A loop containing 10 repeated calls to 'foo2();' where foo2() is an empty function (custom event) with 33k iterations took about 10 ms. Indicating that each function call took on average: (0.010/0.033M)*2260M ~= 684 cycles.

    Some possible performance hogs

    • You don't want these inside inner loops (so they execute many many times)

    A loop containing 2 game text attribute setters;'. I.e.:
    Code: [Select]
    setGameAttribute("gaText", "cba");
    setGameAttribute("gaText", "abc");
    with 100k iterations took about 77 ms. Indicating that each setter call took: (0.077/0.1M)*2260M/2 ~= 870 cycles.

    A loop containing 2 behavior boolean attribute setters;'. I.e.:
    Code: [Select]
    _Collided = false;
    propertyChanged("_Collided", _Collided);
    _Collided = true;
    propertyChanged("_Collided", _Collided);
    with 100k iterations took about 128 ms. Indicating that each behavior setter call took: (0.128/0.1M)*2260M/2 ~= 1446 cycles.
    (It is obviously the call to propertyChanged that takes so much time, and it is likely not neccessary inside an inner loop. (Note to self:There were 12 other attributes in this behavior.))

    An obvious optimization is to remove propertyChanged(). I believe it is unneccessary unless you use attribute listeners. It might be a mistake of the code generation to generaty this code bit for all attribute setters.
    Now I also hypothesize without testing that the dynamics of setGameAttribute() and propertyChanged() can change according to how many attributes you have. So if you have many game attributes,  behavior attributes could be faster. Re. Tom in this post.


    I hope this can lead people to experiment, discuss, and from that understand a little bit about how to write faster code. If you share experiences I can try to summarize results in this first post, from time to time.

    « Last Edit: June 19, 2012, 03:52:08 pm by hansbe »

    darkknighth20

    • Posts: 97
    I'm surprised nobody responded to this!

    Nice post.

    People who wrote code in Stencyl 1.x and moved to 2.x should definitely think about rewriting some of their behaviors due to the new events options available (i.e. using the custom event for listening for keyboard/mouse clicks instead of using the "Always" area and manually checking "if button down" over and over again).
    - DarkKnightH20
    Need Help?
    >> Stencyl Guides
    Current Game In Progress >> A Hack N Slash

    hansbe

    • Posts: 262
    Thanks Mr. Knight.

    You could be right about event listeners.
    I tested a few with this code.
    Code: [Select]
    var foo:Number =0;
    var foo2:Number =0;
    while(foo<100000){
    if(isKeyDown("enter"))
     {foo2++;}
     foo++;
    }
    This timed to about 174 ms, so with 100k iterations at 2.26 GHz each iteration is (0.174/0.1M)*2260M ~= 3932 cycles.

    It will however be neccessary to call isKeyDown about 250 times on a 1 GHz computer to create '1 ms of lag'. So it's not likely to be a common suspect.

    Testing getMouseX() in the same loop as above executes at about 35 ms, so this is faster. (~ 791 cycles)
    Then actor.isMouseReleased() uses 81-114 ms (~ 1830 to 2574 cycles) depending on where the mouse is positioned (which means it must be testing the bounding box before it checks the released condition).

    By the way, I noticed that the empty procedure call I had above actually executes slower, so I think i had a mistake for that one. (I've corrected it now.) However the empty procedure call is > 600 cycles, so it can help to do things locally.

    ---
    I found this: http://upshots.org/actionscript/20-tips-to-optimize-your-actionscript
    It has a lot of good AS3 tips.



    « Last Edit: June 19, 2012, 04:58:55 pm by hansbe »

    Kajitii

    • Posts: 184
    A loop containing 'foo=foo+1;' with 10 million iterations took about 1018 ms on average. Indicating that each iteration took on average: (1.018/10M)*2260M = 230 cycles.
    A loop containing 100 'foo=foo+1;' with 100k iterations took about 430 ms on average. Indicating that each iteration took on average: (0.430/0.1M)*2260M = 9718 cycles.
    However each instruction took on average: 9718/100 ~= 9.7 cycles.

    This interests me, and I wouldn't have guessed at this any time soon :o

    hansbe

    • Posts: 262
    There's definetely a calculation error 9718/100 is closer to 97 not 9.7 but ill look at it later.
    My test results might be off a bit since it took so long background tasks might affect the result as well. But I have at least learned that AS3 is horribly slow when compared to properly compiled c++ code.

    Kajitii

    • Posts: 184
    That's why you run the test multiple times.

    AS3 was designed with a different purpose in mind than C family of code.  One of the design goals of Flash was so that it could function as a plugin within browsers (not entirely sure how stuff like Adobe AIR works though).  That way, if for some reason the complex script of Flash causes a crash, it would only take itself out instead of the whole browser.  C, and by extension C++, was designed for system software, which naturally means it will be really fast.  Unfortunately, it also has more potential for abuse and misuse =/

    Kajitii

    • Posts: 184
    It kinda feels like a waste to do some basic tests and not share it to the public.  I recognize that almost no one here will actually care, since these tests are based on millions of iterations and have highly ignorable execution time in practical usage.  But if it helps even one person, I'll be happy xD

    froz

    • Posts: 250
    I'm suprised, I thought that setting game attribute will take the longest time. WHy is it 2 times faster then setting local or scene attribute? Or is this something that appear only in this test conditions?

    hansbe

    • Posts: 262
    According to [ http://upshots.org/actionscript/20-tips-to-optimize-your-actionscript ], the local attribute is faster, but it is the calls to propertyChanged() and setGameAttribute() that take up time in the tests. I know I had 11 local variables and only 3 game attributes, so that might cause the difference.

    Like Kajiti said, most of the time these differences don't have a lot to say. Basic game loops can do with a few hundred operations, but if you are running some code that requires a loop to run through thousands of times, it may be of importance. An easy optimization in many cases would be to grab the relevant code that stencyl generates, put it in a code block, and put propertyChanged() outside the loop (or delete it if it isnt relevant). (If you are handling game attributes, then store them in a local variable until you are ready to update.)

    hansbe

    • Posts: 262
    That's why you run the test multiple times.

    Sorry, I ran the loops lots of times, but the time each loop runs for (the number of iterations) can cause a difference, as the CPU might break off and do other tasks before continuing.