For some time now I’ve been almost the only one really writing python tools at work. But that has started to change recently. So I’ve been becoming more and more concerned about the lack of formalization in the existing code base. There aren’t really any unit-tests, and there hasn’t been any way of querying what other tools depend on the one you’re changing. Which of course means its easy to break things and not even know it.
So last week I decided to bite the bullet and learn what I could about these sorts of things. I started off with the dependency problem because testing seemed useless without an understanding of what to test.
A quick google search pointed me to this page. Not quite what I was after, but it was a fantastic piece of code to learn just how all encompassing the python standard library is. To save you having to look at the code and trawl through it yourself, this is basically what it does.
In the python standard library is a module called moduleFinder. What it does is it takes a module, finds where it exists on disk, opens the file and reads it in. This is important, at this point it has not imported the code – ie the code has not been executed. It has simply loaded the file and read its contents into memory. Then it takes this and compiles it into a code object using the compile builtin function.
Now the next bit is cool – it takes the code object, which contains the bytecode for the python code that was just compiled, and iterates over all the instructions looking for various commands – most importantly import commands. So its pretty robust, because its using python to reverse engineer its own code and walk through the dependencies.
Because the code isn’t being executed you can query dependencies for maya tools without having to run the script from inside maya. So if I make a change to say my vectors module (which is a standalone python module) the dependency query will still be able to list all the maya scripts that use this module. This is obviously important if you’re in a studio where you have a significant portion of your python codebase outside maya.
Surprisingly this is super fast too, because the bytecode is literally just a stream of bytes. So its just integer comparisons when iterating over the bytecode. For the 320 scripts in one of our branches it takes 20 seconds to scan.
Now 20 seconds is still a drag, so I wrote a simple caching mechanism. The cache basically stores a crc for each file, and that file’s immediate downstream dependencies. When the cache is loaded all entries are checked to make sure they still exist on disk, and the crc’s are matched. Those that have changed get re-scanned and the cache is updated. With the caching, even a huge change that touches many files only takes a second or two to make a query against. Which is super cool.
Now all I have to do is figure out how to associate a set of unit tests with each script and then the dependency query could be made to run the downstream scripts potentially affected to ensure the change is sound. That would be cool.
So anyway, if you’re thinking about writing tools to query dependencies, do it – its really easy, and the link above gives you a great place to start.
This post is public domain
This entry was posted on Tuesday, July 27th, 2010 at 20:31 and is filed under main. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.