2006-06-20

Attack of the Mac Python Zombies!

Nice title, huh?

I choose my titles in the same way I choose what to write about. The title's what I typed in to Google when I faced my most recent problem, and I'm writing this post because Google didn't furnish the answer until after I'd already solved the mystery myself.

Okay, so I didn't actually search for "Attack of the Mac Python Zombies". The first bit's just a bit of sensationalism to hook those of you who, like me, don't care about (or for) Python and might otherwise miss out.

My problem was the error message "fork: resource temporarily unavailable". I'm pretty sure I haven't seen that since university, a decade ago, when someone accidentally or deliberately (always hard to tell) fork-bombed a shared machine. It was a bummer in those days because even if you were the culprit, you couldn't necessarily fix things because your shell didn't necessarily have a "kill" builtin, and if you tried to fork kill(1), well, "resource temporarily unavailable". These days, the shell wars are over, and 'most everybody runs bash(1).

I tell a lie: my problem was that opening a new window in Terminal.app would say "Completed Command" in the title bar, and produce no output at all in the window. I'd never seen this before, and didn't understand right away what was going on. I tried opening a few more, leaving the existing ones open, because I've had trouble with permissions on pseudo-terminals before, but that didn't help. I also thought trying again might help because every few thousand terminal windows, bash(1) crashes on start-up. (I long thought that was a Mac OS problem, but I've seen it much less frequently on Linux too, so now I'm not sure.) Luckily, I already had a couple of shells, and typing "ps aux" in one of them showed me the real problem. "fork: resource temporarily unavailable".

If I'd known at this point what I knew by the end of my investigations, I might have logged in as my alter-ego, "Test Monkey", but I didn't realize that the problem wasn't the overall number of processes: it's that Mac OS has a per-user limit. All Unixes I've met do, but I can't remember the last one that didn't have the limit set to "unlimited" by default. Here's Ubuntu 6.06's default limits, for example:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) unlimited
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

And here's Mac OS 10.4.6:

core file size (blocks, -c) 0
data seg size (kbytes, -d) 6144
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 100
virtual memory (kbytes, -v) unlimited

At this point, if you have anything to do with Mac OS machines, it might be worth committing to memory for future use that not only does Mac OS have a low per-user process limit (100 processes/user), it has a low data seg size (6144 KiB), a low number of open file descriptors per process (256), and a small pipe size (512 B). [I'm not sure the "data seg size" matters in practice because that, as I understand it, is just the limit on the portion of the heap allocated with sbrk(2), and doesn't count heap allocated with mmap(2). Google doesn't seem to know why Apple chose this 6 MiB limit.] Jonathon Arnold points out that Mac OS also has a low default limit on the amount of shared memory (4 MiB); see, for example, here for details.

So: Terminal's broken because we've hit our 100-process limit. I didn't realize this until I'd already killed a few things to give me room to maneuver, sadly. I was now at a point where I could see plainly with ps(1) that my problem was that the system had a couple of hundred zombie processes ("Z" in the "STAT" column) with the name "(python)". So something was running python(1) but not waiting for the child. And that something was still running. And, judging by the fact that my girlfriend was close to hitting her 100-process limit, it was something that we both run.

This, I thought, was handy in that it meant I could rule out all the stuff related to software development that I run. As it turned out, that was a completely false conclusion.

I methodically quit all running GUI applications, checking with ps(1) each time to see if the zombies had been reaped. (Kill the process that the kernel thinks should wait, and init(8) realizes no-one's going to wait and waits for the zombies, putting their souls to rest. So if quitting an application had caused my zombies to disappear, I'd know it was that process' fault.) After eliminating all the GUI applications as possibilities, I started killing the likes of AppleSpell and ATSServer. I got right down to having killed everything but loginwindow, connected from the Ubuntu box and killed loginwindow.

Still my zombies remained.

I was a bit confused at this point, but got my girlfriend to log out, just to see that the same was true for her. It was: logged out, she still had nearly 100 "(python)" zombies.

The good news at this point was that there was little left running. I'd eliminated most of the possibilities, and one of the few processes I didn't recognize was "/System/Library/PrivateFrameworks/DedicatedNetworkBuilds.framework/Versions/A/Resources/bfobserver". I killed that, and all the zombies on the system vanished, and various things that had got stuck for lack of the ability to fork came back to life.

Searching Google for "bfobserver" (which I had read as "b-fob-server", but now realize is "b-f-observer", damn those C programmers!) "python" and "zombies" returned one match: a Google groups thread python at login on macintels that was exactly the same problem.

Basically, the new Distributed Builds stuff in Xcode 2.3, without you having activated it in any way or even run Xcode, runs a Python script "/System/Library/PrivateFrameworks/DedicatedNetworkBuilds.framework/Resources/sctwistd" and forgets to wait for it. It seems to do this every time you log in, and that includes fast-user-switching logins. That took us just under 20 days to reach our 100-process limits, so I expect a rash of problems soon for developers who don't reboot unless forced to, but who share a machine. If I'd installed the latest QuickTime update (which probably just gives me more DRM crap I don't want, and for that it wants me to reboot?) I'd have put it off for another 20 days.

I'm annoyed by Terminal.app's poor diagnostics. I'm annoyed by Activity Monitor's failure to indicate any problem (it doesn't show zombies, even when you've got nearly 200 of them). I'm annoyed by DedicatedNetworkBuilds' bug, and I'm extra annoyed that the Xcode installers means I have that running without having opted-in, or without any likelihood that I'll ever be in a position to make use of it. It's probably jolly useful inside Apple, but when am I going to have enough Apple machines to be able to use it? I don't have enough to use distcc(1), which they still recommend for smaller builds. Why do I have to install all the Xcode crap just to get the latest GCC and weird Apple binutils replacements anyway? Why aren't Apple's packages more transparent? "Click OK to run arbitrary code as root."

On the bright side, I'm thankful I'm running some variant of Unix. I told my girlfriend this is what's so great about Unix: you can do better than just throw your hands up in despair and reboot/reinstall/get a new computer.

She was so impressed.

2006-06-13

Printing from Ubuntu to a Mac's USB printer

Don't try this at home until you've read the whole post.

If you've been around for a few years, you probably read Eric Raymond's The Luxury of Ignorance: An Open-Source Horror Story, about setting up printing on Linux. That was years ago, and it seems that nothing has changed.

Here were the steps I thought you had to go through to set up Ubuntu to talk to a Mac's printer:

  1. Set up your printer so it works locally on the Mac.
  2. On the Mac, in "System Preferences", click "Sharing", and check "Printer Sharing".
  3. On the Ubuntu box, choose "System", "Administration", "Printing". Double-click "New Printer". Choose "Network Printer", and leave it on "CUPS Printer". Type "http://mac-dns-name-or-ip-address:631/printer/mac-queue-name" as the URI. (If you can't remember the queue name, visit the /printer URI in your browser.) Click "Forward", choose manufacturer "Raw" and model "Queue", and you're done.

For me, much of this was non-obvious. The URI left me Googling. (Mac OS was less helpful than usual because "System Preferences" didn't do its usual job of giving me a hint about how to use the service I've just enabled. Whoever implemented that for the other services deserves a prize; whoever's responsible for the fact that some services have no hint deserves a kick in the nuts.)

The manufacturer and model part was also non-obvious. Mainly because of the "oh no, my model of printer isn't in the list!" moment, and the fairly extensive Googling it required to find that it doesn't actually matter.

You might think it would be nice to have some decent browsing mechanism, and preferably Bonjour auto-discovery. I really shouldn't be asking Google for CUPS URIs to type in.

The funny thing is, Ubuntu can do the right thing. What you need to do is ignore the siren charms of "New Printer", and enable "Detect LAN Printers" on the "Global Settings" menu. Then you ignore the scary warning dialog telling you not to do this, type the root password, and then you sit around and wait for a bit. Because it doesn't work instantly, and it doesn't tell you it's doing anything. So don't go and deselect "Detect LAN Printers" thinking it's not done anything useful. Be patient.

If you are patient, a new printer will appear. It will have the name you gave it on your Mac, and it will be selected as the default printer. You can deselect "Detect LAN Printers" afterwards and the printer won't go away, but it will stop working after the next reboot.

The icon will still be there, and it won't look any different, and as far as I can tell there will be absolutely no indication of any problem, but if you print, nothing will happen. The printer's job count will increase, but the job queue will appear empty, and nothing will be printed. So leave "Detect LAN Printers" selected, and pretend it reads something like "Use Network Printers".

If I hadn't re-read Raymond's article, I wouldn't have known that I'd set up a local queue rather than just connected to a remote one. (Do I understand the scary dialog? No. Why does enabling LAN printer detection open port 631 on my system? Why can't the dialog explain what's really going on?)

(Note that printing to a remote queue is a useful trick if you're running Ubuntu in a VM: letting the host OS do the printing for the guest OS is likely to be much easier to set up. Thanks to Donnie Pennington for pointing this out to me after using this trick to print to his Mac's USB printer from Ubuntu running under Parallels.)

That the situation with Linux printing is this piss-poor in 2006 is bad. That everyone in the Linux world read a long and detailed complaint about exactly this several years ago and it still sucks exactly as it used to... that's hard to believe. Ubuntu's put a lot of time and effort into making the icons more orange and the desktop background more brown, but making printing simple enough for my parents to set up?

I guess printing is one of those things so deadly dull that you have to pay people to work on it.

In the meantime, I see that linuxprinting.org has a list of Suggested Printers for Free Software Users. If I'd known about that beforehand, I'd have bought an HP printer.

[Teresa Van Dusen reports that this is still true for Ubuntu 7.10.]

2006-06-11

Browser security versus virtual autism

I tend to ignore articles on security because I don't have a lot of respect for the security companies. As far as I can tell, most security stories are credulous regurgitations of these companies' misleading press releases. Their vested interest in FUD, their conflict of interests with their own customers, their alarmist and uninformative tendencies: all these things make it hard to take them seriously.

Just this last week there was one or other of this motley crew claiming "Windows more secure than Linux". The numbers were blatant nonsense, counting any Linux vulnerability once per distribution, for example, and I'm not interested in that non-story.

In amongst the usual stream of commercial effluent, I found myself reading a couple of interesting papers on phishing.

If you're anything like me (and I hope you're not) you receive several hundred spam messages a day. For my home account, one of the mod3 Solaris zone hosting dudes set up a greylisting system that pretty much squashed the problem. Work uses a commercial filtering system that doesn't work nearly as well, and doesn't even let me say "drop anything in any non-European language", which would be a very effective work-around for me. I'll admit to having been nervous about the greylisting idea ("but won't it delay genuine mail?"), but I've only been inconvenienced once so far, and that wasn't for long. I waste far much more time wading through the obvious spam at work every day than I did on the one occasion I've had to wait for a web site to retry its confirmation mail.

Anyway, given the amount of spam that gets through at work, I see quite a lot of phishing attempts. Some would be worryingly convincing if I had any connection with the alleged institutions, many are fairly obviously bogus if you give them more than a second's glance, and some are laughably bad. That last class has always interested me the most. My assumption was always that such mails wouldn't fool anybody, leaving me wondering why the prospective phisher didn't try a bit harder?

Now I'm starting to wonder if the criminals aren't just being clever, expending no more effort than necessary to fool the foolable.

Reading Why Phishing Works, I was shocked by the lack of acumen displayed by the experiment's subjects. The sample size was, I felt, small: only 22 people. I'm also not sure how representative of the general public university staff and students are. All the same...

Even if you don't care about security, if you're a programmer it's worth reading the paper just to see how far out of touch with technology many users are. In particular, they have no idea what's easy to fake and what's hard to fake.

That text and graphics inside the page are more trusted than text and graphics in the browser's own UI shows you just how much the disconnect between the user's model and system's model can cost.

It's also interesting to see how much of the browser people just ignore. I was thanked for adding a "new" feature to Terminator the other week when all I'd done was add a tool tip to draw attention to a feature that had been there much longer. That was understandable because the feature was otherwise invisible and only enjoyed by people who had just assumed it would be there. This paper, though, suggests that browser features that you and I probably consider highly visible just aren't seen. Or they're seen and misunderstood, which is potentially worse when they're security features.

Not all of the problems identified in the paper are anything to do with technology, though. Except insofar as they suggest that people are bad at transferring real-world common sense to the "virtual" world, or bad at realizing that they're the same world.

I wonder if the woman who "will click on any type of link at work where she has virus protection and system administrators to fix the machine, but never at home" would agree to be beaten by said system administrators with baseball bats in the grounds of a local hospital. Presumably that would be fine, because the hospital can fix things up afterwards? So no harm done, right?

And there's the woman who types in her username and password to see if a site's genuine. Presumably she'd be happy to give me her life savings to see whether I can be trusted to return them?

I do hope those two are now starred out. But I know they aren't, and I know there are millions like them, sharing LANs (or even machines) with us.

I showed the paper to my girlfriend. She didn't know about https: versus http:, didn't know there was a padlock icon anywhere (and I'll admit that I had to look for it in Safari; I'll be switching to Firefox completely as soon as it has spelling checking), or what the padlock means, and definitely didn't know anything about certificates. It had never really occurred to me before that there were millions of people out there typing their financial details in to HTML forms without the vaguest idea of which end of the firestick the boom comes out.

We've accidentally created a whole race of virtual autists, devoid of their usual ability to infer trustworthiness.

If you think that's an over-statement, read the paper and look at the cues the participants were using. In ignorance of the high-tech stuff the browser was offering, they were falling back to tried-and-tested visual cues, despite the fact that it's trivial to copy any image, text, or video on-line.

The authors have a suggestion, if you're not too depressed to keep reading. The Battle Against Phishing: Dynamic Security Skins describes a way of improving the browser's security indicators, but I didn't really get how it's supposed to address what seems to be the more fundamental problem: people just don't know what they're looking for. If Firefox's yellow location bar is as invisible as it appears to be, is that battle not already lost?

2006-06-04

Ubuntu 6.06

I upgraded my Ubuntu box the other night. The process was pretty impressive. For one thing, it was significantly faster than an MS Windows or Mac OS upgrade.

The process started with "Update Manager" saying, instead of the usual list of updates, that there was a new release, and would I like it? I hadn't heard any horror stories in the couple of days since the official release, so why not?

After the packages were downloaded, I was told that I'd modified a couple of files. The NTP configuration, for one. I clicked for the diff and saw that the default configuration now has an NTP server. So that's one of my Ubuntu 5.10 complaints fixed. The other conflicting file was my gdm(1) configuration, but that diff was too complicated for me to bother with, so I took the maintainer's version there too.

The desktop background image changed while I was looking at it, which was a bit weird. I'm used to other OSes taking me out of the normal OS while it's being upgraded. But everything went smoothly enough. The new background image is yet another stinky brown turd lagoon, but this time at least there's a non-brown option. It's a picture of a small green bush called Dawn of Ubuntu, and it's okay to look at. My preference for GNOME's default Clearlooks theme over Ubuntu's festival of orange and brown was remembered.

I don't understand why they don't just use the default GNOME stuff, though.

As for installing Sun's Java 5, it has become a little easier. Choose "Add/Remove..." on the GNOME "Applications" menu, check "Show unsupported applications", and type "sun java" in the search field. You can then select and install Sun's Java 5 JRE (but not the JDK, which is a shame).

The license agreement is presented as a very respectable-looking dialog, but there's nothing equivalent to the update-java-alternatives(1) step, and GCJ remains the default, which is rather unfortunate. (I can understand not changing the default automatically if the package had been installed as a prerequisite, but if it's been installed by direct user intervention, isn't it likely that it's the version they prefer?)

Anyway, the Sun JRE installation process is quite a bit more convenient now, though I'm not sure what mechanism an application is supposed to use to choose an appropriate JVM. It seems to me that the update-java-alternatives(1) approach is fundamentally flawed when compared to something like JNLP that lets you say "I need 1.5 or better". A global default is perhaps sometimes useful, but not nearly as useful as being able to specify your exact requirements. As it is, I guess we're going to have to hard-code the location that Sun's 1.5 JRE gets installed at.

I've been able to get rid of my built-from-source bzr(1) in favor of the 0.8.2 package, but other than that and the things mentioned above, today is mostly like yesterday. Which is I suppose a recommendation of sorts, if not a particularly glowing one. Seen from the Debian perspective, though, Ubuntu 6.06 is like Debian testing/unstable but without the random bits of breakage every time you upgrade.

Unlike the Mac OS 10.3 to 10.4 transition, I haven't yet noticed much breakage, or that I'm being drowned in fatuous features. And Ubuntu 6.06 didn't cost me $120 or take forever to install, either.

[An earlier version of this post mentioned update-alternatives(1) instead of update-java-alternatives(1), but David Herron rightly pointed out that it's not generally a good idea to change one of the Java alternatives without changing all of them, and update-java-alternatives(1) makes that a bit easier.]