Saturday, October 15, 2011

Multi-threaded Python 2.7 WTFAQ?

No, I said multi-threaded... ah close enough.


I'm just beginning my first experiments with python 2.7 apps, using "threadsafe: true". But I'm a clueless n00b as far as python goes. Well, not a n00b, but still a beginner. And then this multi-threading thing turns up, and I find myself groaning "oh man, really, does it have to get this complex?" I think I hear a lot of similar groans out
there ;-)


I'm betting that the whole "multithreaded" thing in python appengine apps is scaring plenty of people. I've done a lot of concurrent programming, but the prospect of dealing with threading in python has daunted me a bit because I'm a beginner with python and appengine as it is - this just makes life harder. But hey, it's being added for a reason; I'd best quit complaining and start figuring it out!

Thinking about threads and python, I realised that I didn't know how I needed to actually use multi-threading to make my apps leaner and meaner. I mean, why would I use them? They're for doing inherently concurrent things. Serving up pages isn't inherently concurrent stuff, at the app development level. What exactly is expected here? Shouldn't the framework be doing that kind of thing for me?

And of course that was the aha moment. The framework *is* doing the work for me.

The situation with python appengine development up until now has been that instances process serially. They take a request, see it through to its end. They take another request. And so on. That's cool, but instances spend a lot of time sitting around waiting when they could be doing more work.

But with the new python 2.7 support, you can tell appengine that it would be ok to give instances more work when they are blocked waiting for something. eg: if they are doing a big url fetch, or a long query from datastore, something like that, then it's cool to give them another request to begin working on, and come back to the waiting request later when it's ready. You do that by setting "threadsafe: true" in your app.yaml .

Being threadsafe sounds scary! But actually it shouldn't be a huge deal. Pretty much it's about what you shouldn't do, and largely you're probably not doing it anyway.



The WTFAQ


  • I drove off a cliff and was trapped in my car for the last couple of weeks, surviving on old sauce packets and some pickles that were on the floor. So I'm a bit out of the loop. WTFAQ are you talking about?
  • Threads are when my socks are wearing out and there are dangly bits. Multithreading is when they are really worn out. Right? 
    • Multi-threading means having multiple points of execution on the one codebase in the one address space. You can do some really cool stuff with threads. Or you can safely ignore them.
  • But there is some minimal stuff I should be paying attention to, right?
    • Yup. What you need to know is how to support Concurrent Requests.
  • Concurrent what now?
    • Concurrent Requests means that your instances can serve multiple requests at a time, instead of just one at a time. You'll be paying for those instances. So this should be a bit cheaper.
  • I like money.
    • yes, ok.
  • Ok, so what do I do to get these durn newfangled concurrent whatsits?
    • It's easy. Just follow these steps:
      • You'll be using Python 2.7
      • To use Python 2.7 you have to use the High Replication Datastore. If your app has been around for a while (from before there was a choice of datastore type) then you might be using the Master/Slave datastore. If so, you need to migrate. If you think that's you, then read this:
      • Read this, but don't let it freak you out:
      • Also glance over the new Getting Started sample, it's a bit different.
      • If you got this far and haven't read any of the links above, congratulations. RTFM is for girly men (of all genders). 
      • Figure out if your app is going to be ok:
        • Calls to memcache, datastore, other services of AppEngine, are fine.
        • urlfetch and other httpish stuff (urllib, urllib2?) is fine.
        • Normal code touching local variables is fine.
        • Don't mess with instance memory (unless you know what you're doing). Mostly you can only use it for caching anyway; if you're not already doing that, don't worry about it. Basically, this means staying away from global variables. Multiple requests can come in and fiddle with those globals at the same time, Which Can Be Bad.
        • Libraries included by AppEngine are fine, or else you'll get "don't use this" warnings. So don't worry too much here. But do check this link for changes to libraries with Python 2.7, some of that might be relevant to you.
        • You didn't read that, did you? You are Rock & Roll incarnate.
        • Some of your third party libraries might be messing with global memory, and not be threadsafe. You know that shady date library you scored in a back alley on http://code.google.com/hosting? That might be a problem. Read the code, ask around, or just give it a shot and flag the fact that it might blow up in your face.
      • Rewrite your main.py or equivalent to use WSGI script handlers. That means it should look like this http://code.google.com/appengine/docs/python/gettingstartedpython27/helloworld.html and not like this http://code.google.com/appengine/docs/python/gettingstarted/usingwebapp.html
      • Set up your App.yaml properly; change "runtime: python" to "runtime: python27" and add "threadsafe: true". Like this:
        application: helloworld
        version: 1
        runtime: python27
        api_version: 1
        threadsafe: true
        
        handlers:
        - url: /.*
          script: helloworld.app
      • Make sure to get the latest appengine sdk; v1.5.5 or later. You can't actually run with threadsafe:true in the dev appserver yet, but you need at least this version or it'll refuse to upload.
  • So I can't run this stuff on the dev appserver?
    • Nope. Just set "threadsafe: false" when running locally. That's a bit annoying, but I'm sure it'll be sorted out soon.
  • Damn, that list of stuff is tl;dr. Do I have to do this?
    • Nope. In fact, it's early days and you'll be heading into experimental land if you do it. If it's totally weirding you out and you have better things to do with your life, just ignore this whole thing for a bit. Eventually, way later on, it'll become properly supported, and then probably compulsory, but by then there'll be better guides, better understanding in the community, all that. It's totally fair to let the crazy nerds race out and crash on the new features, then skate in past the fallen bodies like Steven Bradbury: 

2 comments:

  1. Thanks for a clear description of the new multi-threading python option. I'm sure it will be a huge help for many developers.

    Please keep posting appengine articles!

    ReplyDelete
  2. Very clear article.
    Some suggestion for the next article?
    Datastore reads hidden costs :). How do you keep that billing voice low?

    ReplyDelete