Monday, October 10, 2011

The Spiny Norman Test


Update: Results are in, see Go Spiny Norman, Go.

Previously, in The Amazing Story of AppEngine and the Two Orders Of Magnitude, I've written about minimizing the cost of instances in the new AppEngine billing regime. But I think I made a mistake, and I think many people are making the same mistake.

Here's one of the graphs that I showed of instance usage from my appengine app Syyncc:


My posts were largely about trying to drop the blue line down (that's "Total" instances), and I largely ignored the yellow line, "Active" instances.

Now to get that blue line down, I did two things. I first set Max Idle Instances to 1, from Automatic. That is detailed here, and was successful in dropping the blue line down. Next, I changed my app's task behaviour, from kicking off 50 tasks every 2 mins, to smoothing those out, scheduling one every two seconds.

Once I got my billing results, these changes made a huge impact.  But, the numbers were puzzling. Firstly, they were too low (which I just accepted happily, as these numbers represent money in my pocket). Secondly, it appeared that all the benefit was seen based on the first change (Max Idle Instances), with no change from the smoothing out of tasks. That's been bugging me.

And then on the AppEngine list, Gerald Tan made this comment:


The reason why your Frontend Instance hours are lower than you expected is because you assumed that you will be billed for the area under the BLUE line in the Instance graph. It's not. You are being billed for the area under the YELLOW line (Active Instance) PLUS your Max Idle Instance setting. So your Active Instances is hovering at around ~0.72, and I assume you have set your application's Max Idle Instance to 1. Therefore ~1.72 * 24 = ~41.28 Instance Hours

Oh really?? That would match the data, very cool. And why were the numbers so high before I set the Max Idle Instances to 1?

This Post-Preview Pricing FAQ (should have been called a Primer for the alliteration) says some unclear things. We have this:

"Instances are charged for their uptime in addition to a 15-minute startup fee, the startup fee covers what it takes for App Engine to bring up and down the instance. So, if you have an on-demand instance only serving traffic for 5 minutes, you will pay for 5+15 minutes, or $0.08 / 60 * 20 = 2.6 cents. Additionally, if the instance stops and then starts again within a 15 minute window, the startup fee will only be charged once and the instance will be considered "up" for the time that passed. For example, if an on-demand instance is serving traffic for 5 min, is then down for 4 minutes and then serving traffic for 3 more minutes, you will pay for (5+4+3)+15 minutes, or $0.08 / 60 * 27 = 3.6 cents."


On the other hand, this:

Max Idle Instances: Decreasing this value will likely decrease your bill as fewer idle instances will typically be running and we will not charge for any excessive idle instances. In this case the scheduler knob is a suggestion to the scheduler but we will not charge you for excess if the scheduler ignores the suggestion. For instance, if you set Max Idle Instances to 5 and the scheduler leaves 16 instances up for some length of time, you will only be charged for 5 instances.


So, I think this might mean the following:

If you set "Max Idle Instances" to Automatic (the default setting), that means you are letting the scheduler spend your money. It'll keep as many instances running at any time as it thinks you need, and you'll pay for all of them (plus that nasty 15 minute bonus on starting up extras). This means, you pay for the area under the blue line.

If you set "Max Idle Instances" to a specific value, you'll pay for your active instance time plus your "Max Idle Instances" setting, or your Total instance time, whichever is less. ie: you pay for the minimum of (area under yellow line + Max Idle Instances) and (area under blue line).

So setting Max Idle Instances to an actual number is a good idea. The lower you set it, the more it might affect the scheduler's decisions, but still, to minimise cost, set it to a finite number.

Great conjectures. But then, the old lady in my head (oh god she's really in there) says this:


TEST IT!

Ok old lady, I'll test it. razza frazza rackkin testin frazza razza....

---

Ok, so first we need an hypothesis. Put a new line on the graph, a green line, which is the yellow line, raised up by the setting of Max Idle Instances. If Max Idle Instances is 3, it'll look like this:


The pink area is the intersection of the area under the blue line and the area under the green line.

Hypothesis: Ignoring the 15 minute cost for spinning up new instances, the price we pay should be the pink area on the graph. That is, the moment by moment minimum of (total instances) and (active instances + Max Idle Instances). If Max Idle Instances is Automatic, then there is no green line, and we pay for the area under the blue line.

So how do we test that hypothesis?

1 - First test that we pay for the area under the blue line when Max Idle Instances is Automatic.
2 - Next, test that we pay for the pink area when Max Idle Instances is set to something.

To get a good test here, we want to create an instance usage profile where the blue line and the yellow line are disparate. My best guess for how to do this is to create some spiky usage, that should leave too many instances running most of the time.

Enter Spiny Norman!

Spiny Norman is a Worker class, designed to do one thing; cause AppEngine to experience very bursty load.

import logging

from Worker import Worker
from datetime import timedelta
from google.appengine.ext import db

class SpinyNorman(Worker):
    _minutesBetweenSpines = 12
    _spineWidth = 1000000
    _numberOfSpinesRemaining = db.IntegerProperty()
        
    def CreateSpines(cls, aSpineLength, aNumberOfSpines):
        lcount = 0
        while lcount < aSpineLength:
            lnorman = SpinyNorman()
            lnorman._numberOfSpinesRemaining = aNumberOfSpines
            lnorman.enabled = True
            lnorman.put()
            lcount += 1
    CreateSpines = classmethod(CreateSpines)

    def doExecute(self):
        self._numberOfSpinesRemaining -= 1
        #
        lcount = 0
        while lcount < self._spineWidth:
            lcount += 1
        
        logging.debug(lcount)
    
    def doCalculateNextRun(self, aUtcNow, alastDue):
        if self._numberOfSpinesRemaining > 0:
            if alastDue:
                return alastDue + timedelta(minutes=self._minutesBetweenSpines)
            else:
                return aUtcNow + timedelta(minutes=self._minutesBetweenSpines)
        else:
            return None # time to stop
        
Spiny Norman creates a spiny workload, as follows:

Each "Spine" is a set of tasks running (doExecute()) at the same time. The length of the spine is the number of tasks. The width of the spine (in time) is a measure of how much work the spine will do (how long it'll work for). Spines are set apart from each other in time, which is the minutes between spines. There are a fixed number of spines.

You kick off Spiny Norman by calling SpinyNorman.CreateSpines(spineLength, numberOfSpines) . That creates a number of instances of Spiny Norman equal to the spineLength, and sets the countdown for how many iterations they should continue for (numberOfSpines). _spineWidth is the number of times to sit in a busy loop in doExecute.  _minutesBetweenSpines is used to calculate the next run time in doCalculateNextRun.

I'm using a spine length of 250 (that is, 250 tasks), a spine width of 1,000,000 (enough load to notice some work being done, a few second's worth), 12 minutes between spines and 100 spines total (ie Spiny Norman runs for about 1200 minutes, or 20 hours, total).

I've set up a new AppEngine instance, I've enabled billing, and I've kicked off Spiny Norman to run during his own billing day. I've left Max Idle Instances set to Automatic. We should see a huge difference between the blue and yellow instance lines, and the billing should tell us which one I'm paying for, which will test part 1 of the hypothesis.

In the next post I'll publish the result of this test, and I'll kick off the next test. Stay tuned!

No comments:

Post a Comment