Greetings From Jim Dennis
Those of you who read slashdot (http://www.slashdot.org), the Linux Weekly News (http://www.lwn.net), or other common Linux webazines and forums have undoubtedly tired of reading about the Mindcraft fiasco. If so, maybe you'll skip this and go unto the usual collection of "Answer Guy" questions.
The Mindcraft story has been interesting. As some of my colleagues have pointed out their "attack" on Linux serves more to legitimize Linux as a choice for business servers than to undermine it. In addition it appears that the methodology they used has uncovered some legitimate opportunities for improvement in the Linux process scheduling facilities.
I'm referring to the "thundering herd" issue that results from a large number of processes all doing a select() call on a given socket for file resource -- such as having a 150 Apache servers listening on port 80. However that is not a new issue; Richard Gooch (a significant contributor to the Linux kernel mailing list and code base) discussed similar issues and possible patches almost a year ago:
It looks like some work will go into the Linux kernel and into Apache to resolve some of those issues. In addition I know that Andrew Tridgell and Jeremy Allison (a couple of the principal members of the Samba development team) have been been continuing thier work on Samba.
So the Linux/Apache/Samba combination will show improvement for the general case. Samba 2.0.4 just shipped and already has some of these enhancements. Some of the interesting changes to the Linux kernel might already be present in the 2.3.3 developmental kernel (and might be easily pack ported as a set of 2.2.9 patches). So we could see some of the improvements within a couple of weeks.
Some of these improvements may give Linux a better showing in any "Mindcraft III" or similar benchmark. Maybe they won't. The improvements will be for the general case --- and I don't see much chance that open source developers will sneak in special case code that will only improve "benchmark" performance without being of real benefit.
That's one of the problems with closed source vendors. There's great temptation to put in code that isn't of real value to real customers but will be great for benchmarks and magazine reviewers. This has been detected on several occassions by several vendors; but it would be completely blatant in any open source project.
Frankly, I don't care if we improve our Mindcraft results. I prefer to question the very premises on which the whole discussion is based.
There are three I'd like to mention:
The fallacy of the whole Mindcraft mindset is that we should have "big servers" to provide file and web services. Let's ask about that.
The reason Microsoft wants to push big servers should be relatively obvious. Microsoft's customers are the hardware vendors and VARs. Most end customers, even the IT departments at large corporations, don't install their own OS. They order a system with the OS and major services pre-installed (or order systems and pay contractors and/or consultants to perform the installation and initial configurations).
So, it is in Microsoft's vested interest to encourage the sale of high end and expensive systems. The cost of NT itself is then a tinier fraction of the overall outlay. One or two grand for the OS seems less outrageous when expressed as a percentage of 10 to 20 thousand dollars.
So, how many customers really need 4-way SMP systems? Are 4-way SMP systems EVER really a better choice for web and file services than a set of four or more similar quality separate systems?
Big 4 or 8 CPU SMP servers are probably the best choice for some applications. It's even possible that such systems are optimal for SOME web and file servers. What's really important, however, is whether such systems are appropriate to YOUR situation.
Back when NT was first starting to emerge as a real threat to Netware it was interesting that the press harped on the lack of "scaleable SMP" support in Netware 3.x and 4.x. I'm sure there are analysts today who would continue to argue that this was the primary reason for Netware's loss of marketshare during the early to mid '90s.
Personally I suspect that the bigger factors in Netware's woes were from three other causes:
Of course, I could be wrong. I'm not an industry analyst. However, I do know that the considered opinion of the Netware specialists I knew back around '93 was that Netware didn't need SMP support. It was plenty fast enough without additional processors. NT, on the other hand, has so much overhead that it needs about 4 CPUs to get going.
So, if we're not going to use "big servers" how do we "scale?"
Replication and Distribution.
Look at how the whole Internet scales. We have the DNS system which distributes (and delegates) the management of a huge database over millions of domains. We don't even bat an eye that an average DNS lookup takes less than a second. The SMTP mail system also has proven scalability. It handles untold millions of messages a day (some of which isn't even spam).
Of course some people are already chomping at the bit to write to me and explain what an idiot I am. There are problems with replicating files and HTML across multiple servers. Some applications are very sensitive to concurrency issues and race conditions. There are cases where the accessor of a file must have the absolute latest version and must be able to retain a lock on it. There are cases where we want to lock just portions of files, etc.
However, these are not the most common cases. Going for the "big server" approach is often a sign of laziness. Rather than identify the specific sets of applications that require centralized control and access, they try to toss everything on the "one size stomps all" server.
In the degenerate case of the Mindcraft benchmarks it would be amusing to pit four low cost PCs running Linux against one "big server" running NT. I say "degenerate case" since the benchmarks used there don't seem to have any concurrency or locking issues (at least not for the HTTP portions of the test).
Needless to say we'd also seem some advantages beyond the scalability of our "hoard of cheap servers" approach. For example we could use dynamic DNS and failover scripts to ensure that transparent availability was maintained even through the loss of three of the four servers. There's certainly some robustness to this approach. In addition we can perform tests and upgrades to one or more systems in these loose clusters without any service down time.
Because these use commodity components it's also possible to keep shelf spares in an on site depot. Thus reducing the downtime for individual nodes and providing the flexibility to rapidly increase the clusters capacity in the face of exceptional demands.
All that --- and it's usually CHEAPER, too.
Naturally there are some challenges to this approach. As I mentioned, we have to configure these systems with some sort of replication software (rdist, rsync) and test regularly to ensure that the replication process isn't introducing errors and/or corruption. There are also the problems with writable access and the needs for the nodes in a cluster to communicate about file locking and application (i.e. CGI) state.
The point is not so much to promote the "hoard of thin servers" approach as to question the premise. Do we really need a "big server" for OUR task?
I've talked about the fundamental disconnect between mass marketing and customer requirements before. "Mass marketing" sells features in the hopes that masses will will buy them. Customers must consider the "benefits" of each "feature" before accepting any arguments about the superiority of one product's implementation of a given "feature" over another.
As an example let's consider Linux' much vaunted "multi-user" feature. To many people this is not a benefit. Many people will never have anyone else "logged into" their system. To people like my mom "multi-user" is just an inconvenience that requires her to "login" and means that she sometimes needs to 'su' to get at something she wants. (Granted there are ways around those). In some way Linux' "multi-user" features (and those of NT, for that matter) are actually a detriment to some people. The represent a cost (albeit a small and easily surmounted one) to some users.
This leads us to the other two issues that I would question.
Apache is not necessarily the best package for providing high speed, low-latency, HTTP of simple, static HTML files.
There are lightweight micro web servers that can do this better. I've also heard of people who use a small cluster of Squid proxy servers interposed between their Apache servers and their routers. Thus the end users are transparently access an organizations Squid caches rather than directly accessing it's web servers. This is a strange twist on the usual case where the squid caches are located at the client's network.
By all accounts SMB is a horrid filesharing protocol. The authors of Samba take a certain amount of wretched glee in describing all of the misfeatures of this protocol. Its sole "advantage" is that it's included, preconfigured with 98% of the the client systems that are shipped by hardware vendors today.
Note: I'm NOT saying that NFS is any better. Its main advantage is that almost all UNIX systems support it.
Personally I have high hopes for CODA. Its about time we deployed better filesystems for the more common requirements of a new millennia.
I'm not the first to say it:
"There are lies, damned lies, and benchmarks"
However, the important thing about any statistic or benchmark is to understand the presenter. Look behind the numbers and even the methodology and ask: "Who says?" "What do they want from this?"
Alternatively you can just reject statistics and benchmarks from others, and make your decisions based on your own criteria and as a result of your own tests.
The scientific method should not be used solely by scientists. It has application for each of us.
-- Jim Dennis