Environment Blog Archive
GNU Screen + SSH_AUTH_SOCK; my new approach
Over the years, I've played around with several different methods of keeping my forwarded SSH-Agent authentication socket up-to-date in long-running screen sessions (referenced via the $SSH_AUTH_SOCK shell variable). The basic issue here is that Screen sees the process environment at the time it was initially launched, not that which exists when reattaching in a subsequent login session. This means that the $SSH_AUTH_SOCK variable as screen sees it will refer to a socket which no longer exists (as it was removed when you logged out after detaching on the initial login when starting screen).
Some of my previous methods have included a hard-coded variable for the socket itself (downsides: if it's a predictable name you're potentially opening some security issues, plus if you open multiple sessions to the same account, you kill the latest socket), symlinking the latest $SSH_AUTH_SOCK to a hard-coded value on login (similar issues), dumping $SSH_AUTH_SOCK to a file, and aliasing ssh and scp to first source said file to populate the local window's enviroment (doesn't work in scripts, too much manual setup when adapting to a new system/environment, won't work with any other subsystem not already explicitly handled, etc).
Recently though, I've come up with a simple approach using screen's -X option to execute a screen command outside of screen and just added the following to my .bashrc:
screen -X setenv SSH_AUTH_SOCK "$SSH_AUTH_SOCK"
While not perfect, in my opinion this is a bit of an improvement for the following reasons:
- It's dirt-simple. No complicated scripts to adjust/maintain, just a command that's almost completely self-explanatory.
- It doesn't kill the environment for existing screen windows, just adjusts the $SSH_AUTH_SOCK variable for new screen windows. This ends up matching my workflow almost every time, as unless a connection dies, I leave the screen window open indeterminately.
- If you have multiple sessions open to the same account (even if not running both in screen), you're not stomping on your existing socket.
- Did I mention it's dirt-simple?
There are presumably a number of other environment variables that would be useful to propagate in this way. Any suggestions or alternate takes on this issue?
Managing Perl environments with perlbrew
As a Perl hobbyist, I've gotten used to the methodical evolution of Perl 5 over the years. Perl has always been a reliable language, not without its faults, but with a high level of flexibility in syntactical expression and even deployment options. Even neophytes quickly learn how to install their own Perl distribution and CPAN libraries in $HOME. But the process can become unwieldy, particularly if you want to test across a variety of Perl versions.
To contrast, Ruby core development frequently experiences ABI breakages, even between minor releases. In spite of the wide adoption of Ruby as a Web development language (thanks to Ruby on Rails), Ruby developers are able to plod along unconcerned, where these incompatibilities would almost certainly lead to major bickering within the Perl or PHP communities. How do they do it? The Ruby Version Manager.
Ruby Version Manager (RVM) allows users to install Ruby and RubyGems within their own self-contained environment. This allows each user to install all (or only) the software that their particular application requires. Particularly for Ruby developers, this provides them with the flexibility to quickly test upgrades for regressions, ABI changes and enhancements without impacting system-wide stability. Thankfully a lot of the ideas in RVM have made their way over to the Perl landscape, in the form of perlbrew.
Perlbrew offers many of the same features found in RVM for Ruby. It's easy to install. It isolates different Perl versions and CPAN installations in your $HOME and helps you switch between them. It automates your environment setup and teardown. And most importantly, using perlbrew means not having to clutter your default system Perl with application-specific CPAN dependencies.
Getting started with perlbrew couldn't be easier. A quick one-liner is all it takes to install perlbrew in your home directory.
$ curl -L http://xrl.us/perlbrewinstall | bash
If you need to install perlbrew somewhere other than your home directory, just download the installer and pass it the PERLBREW_ROOT environment variable.
$ curl -LO http://xrl.us/perlbrew $ chmod +x perlbrew $ PERLBREW_ROOT=/mnt/perlbrew ./perlbrew install
Follow the instructions on screen and you'll be ready to use perlbrew in no time. The perlbrew binary will be installed in ~/perl5/perlbrew/bin, so make sure to adjust your login $PATH accordingly.
Once you're done installing perlbrew there are a couple commands you'll want to run before installing your own Perl versions or CPAN modules. The perlbrew init command is mandatory; this initializes your perlbrew directory. It can also be used later if you need to modify your PERLBREW_ROOT setting. The perlbrew mirror is optional (but recommended) to help you select a preferred CPAN mirror.
$ perlbrew init $ perlbrew mirror
Next comes the fun part. Start off by verifying the Perl version(s) that perlbrew sees.
$ perlbrew list * /usr/bin/perl (5.10.1)
Install a newer version of Perl.
$ perlbrew install 5.12.3
Now switch to the newer Perl.
$ perlbrew list * /usr/bin/perl (5.10.1) perl-5.12.3 $ perlbrew switch perl-5.12.3 $ perlbrew list /usr/bin/perl (5.10.1) * perl-5.12.3 $ perl -v This is perl 5, version 12, subversion 3 (v5.12.3) built for x86_64-linux Copyright 1987-2010, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page.
Alternatively, if you only want to test a different Perl version, try the perlbrew use command (note: this only works in bash and zsh). Unlike the switch command, use is only active for the current shell.
$ perlbrew use system $ perlbrew list * /usr/bin/perl (5.10.1) perl-5.12.3
A quick peek behind the curtain reveals much of the simplicity behind perlbrew.
$ ls -l ~/perl5/perlbrew/ total 2680 -rw-r--r-- 1 testy users 408 Feb 10 23:58 Conf.pm drwxr-xr-x 2 testy users 512 Feb 10 23:46 bin drwxr-xr-x 4 testy users 512 Feb 11 09:59 build -rw-r--r-- 1 testy users 1333196 Feb 11 10:33 build.log drwxr-xr-x 2 testy users 512 Feb 11 09:59 dists drwxr-xr-x 2 testy users 512 Feb 10 23:47 etc drwxr-xr-x 4 testy users 512 Feb 11 10:32 perls $ ls -l ~/perl5/perlbrew/perls/ total 8 drwxr-xr-x 5 testy users 512 Feb 11 00:38 perl-5.12.3 drwxr-xr-x 5 testy users 512 Feb 11 10:32 perl-5.13.6
If you're a Perl developer, the perlbrew project may help alleviate a lot of the pain associated with team development or multi-tenant programming environments. Suddenly it becomes much easier to manage your own software requirements, resulting in faster development and testing cycles for you, and fewer headaches for your System Administrators.
Visit at DistribuTECH
I had the chance to attend DistribuTECH in San Diego, CA this past week. DistribuTECH is billed as the utility industry's leader in smart grid conference and exposition. End Point was present at the conference on behalf of Silver Spring Networks. Silver Spring Networks contracted with us to provide a Liquid Galaxy installation for their exhibit.
The Liquid Galaxy did its job from what I could tell. The exhibit was consistently surrounded with conference goers both interested in listening and watching the tours that were being presented as well as wanting to see what the Liquid Galaxy was all about. This was the first time I had seen the Liquid Galaxy and was quite impressed with how well it worked. I saw many people moving their bodies in sync with what was being displayed on the screen, showing that they felt immersed while within the galaxy. One gentleman knelt down while attempting to look under a graph that was being presented on the screen. This same person had returned to the exhibit several times, bringing colleagues back each time to "show off" what he had found.
I spent some time on the conference floor, checking out what was being displayed and seeing how others were getting the attention of the attendees. I could not find anything that compared to the Liquid Galaxy in both wow factor and usability. The fine folks at Silver Spring Networks also seemed impressed with the reaction they were receiving.
One product that I found interesting while walking the floor was a large unit that would freeze itself at night, when power is less expensive, the draw on the grid is less, and it is cooler outside and then use the ice as a coolant to be pumped into the AC unit during the day resulting in a reduction of cost and energy usage to cool a building.
I also took a look at the Silver Spring Networks products on display. One that interested me was their home portal that allowed customers that had Silver Spring smart meters installed by their utility to visit a web portal and view several things including their current usage and compare their usage to that of their neighbors. I can see how someone concerned with the environment could use this information to lessen their power usage.
Keep an eye out here for a few blog posts from Adam on his experience with tour development for the Liquid Galaxy.
Debugging jQuery
A recent reskin project for a client requires that we convert some of their old Prototype code to jQuery as well as create new jQuery code. I have not done much with converting Prototype to jQuery and I felt like my debugging tools for JavaScript were under par. So this morning I set out to find what was available for jQuery and I found this article on the subject.
I've used Firebug for some time now, but was unaware of some of the supporting plugins that would certainly help with debugging JavaScript. Some of the plugins and methods found in the article that I found immediately helpful were:
- FireFinder: Makes it quite easy to verify that the selector values in your code are correct and that the proper elements are returned. I was able to immediately pinpoint problems with my selectors and this brought to light why certain events were not firing.
- Firebug Console: Using the console.log function allowed me to check values without littering my code with alert statements.
- FireQuery: At a glance this plugin for Firebug shows which elements have event handlers bound to them.
- Firebug Breakpoints: Setting breakpoints and watch statements in your code makes it easier to see what is happening in the JavaScript code as it is executed instead of trying to figure out what happened after the code has run its course.
Thanks to the author of the article, Elijah Manor, for the in-depth information on debugging jQuery code.
Guidelines for Interchange site migrations
I'm involved at End Point often with Interchange site migrations. These migrations can be due to a new client coming to us and needing hosting or migrating from one server to another within our own infrastructure.
There are many different ways to do a migration, in the end though we need to hit on certain points to make sure that the migration goes smoothly. Below you will find steps which you can adapt for your specific migration.
During the start of the migration it might be a good time to introduce git for source control. You can do this by creating the repository and cloning it to /home/account/live, setting up .gitignore files for logs, counter files, gdbm files. Then commit the changes back to the repo and you've now introduced source control without much effort, improving the ability to make changes to the site in the future. This is also helpful to document the changes you make to the code base along the way during the migration in case you need to merge changes from the current production site before completing the migration.
- Export all of the gdbm databases to their text file equivalents on the production server
- Take a backup from production of the database, catalog, interchange server, htdocs
- Setup an account
- Create the database and user
- Restore the database, catalog, interchange server and htdocs
- Update the paths in interchange/bin for each script to point at the new location
- Grep the restored code for hard coded paths and update those paths to the new locations. Better yet move these paths out to a catalog_local.cfg where environment specific information can go.
- Grep the restored code for hard coded urls and use the [area] tag to generate the urls
- Update the urls in products/variable.txt to point at the test domain
- Update the sql settings in products/variable.txt to point at the new database using the new user
- Remove the gdbm databases so they will be recreated on startup from the source text files
- Install a local Perl if it's not already installed (./configure -des will compile and install Perl locally)
- Install Bundle::InterchangeKitchenSink
- Install the DBD module for MySQL or PostgreSQL
- Review the code base looking for use statements in custom code and Require module settings in interchange.cfg. Install the Perl modules found into the local Perl.
- Setup a non ssl and ssl virtual host using a temporary domain. Configure the temporary domain to use the SSL certificate from the production domain.
- Firewall or password protect the virtual host so it is not accessible to the public
- Generate a vlink using interchange/bin/compile and copy it into the cgi-bin directory and name it properly
- Startup the new Interchange
- Review error messages and resolve until Interchange will start properly
- Test the site thoroughly, resolving issues as they appear. Make sure that checkout, charging credit cards, sending of emails, using the admin, etc all function.
- Migrate any cron jobs running on the current production site, such as session expiration scripts
- Setup logrotation for the new logs that will be created
- Verify that you have access to make DNS changes
- Set the TTL for the domain to a low value such as 5 minutes
- Modify the new production site to respond to the production url, test by updating your hosts file to manually set the IP address of the domain
- Shutdown the new Interchange
- Restore a copy of the original backup for Interchange, the catalog and htdocs to /tmp on the production server
- Shutdown the production Interchange, put up a maintenance note on the production site.
- Take a backup of the production database and restore on the new server
- Diff the Interchange, catalog and htdocs directory between /tmp and the current production locations, making note of the files that have changed since we took the original copy.
- Copy the files that have changed, making sure to merge with any changes we have made on the new production site. Making sure to copy over all .counter and .autonumber files to the new production site.
- Start Interchange on the new production server
- Test the site thoroughly on the new production server, using the production url. Make sure that checkout with charging the credit card functions properly.
- Resolve any remaining issues found during the testing
- Setup the Interchange daemon to start at boot for this site in /etc/rc.d/rc.local or in cron using @reboot
- Update DNS to point at the new production IP address
- Update the TTL of the domain to a longer value
- Open the site to the public by opening the firewall or removing the password protection
- Keep an eye on the error logs for any issues that might crop up
This will hopefully give you a solid guide for performing an Interchange site migration from one server to another and some of the things to watch out for that might cause issues during the migrations.
Why is my load average so high?
One of the most common ways people notice there's a problem with their server is when Nagios, or some other monitoring tool, starts complaining about a high load average. Unfortunately this complaint carries with it very little information about what might be causing the problem. But there are ways around that. On Linux, where I spend most of my time, the load average represents the average number of process in either the "run" or "uninterruptible sleep" states. This code snippet will display all such processes, including their process ID and parent process ID, current state, and the process command line:
#!/bin/sh
ps -eo pid,ppid,state,cmd |\
awk '$3 ~ /[RD]/ { print $0 }'
Most of the time, this script has simply confirmed what I already anticipated, such as, "PostgreSQL is trying to service 20 times as many simultaneous queries as normal." On occasion, however, it's very useful, such as when it points out that a backup job is running far longer than normal, or when it finds lots of "[pdflush]" operations in process, indicating that the system was working overtime to write dirty pages to disk. I hope it can be similarly useful to others.
dstat: better system resource monitoring
I recently came across a useful tool I hadn't heard of before: dstat, by Dag Wieers (of DAG RPM-building fame). He describes it as "a versatile replacement for vmstat, iostat, netstat, nfsstat and ifstat."
The most immediate benefit I found is the collation of system resource monitoring output at each point in time, removing the need to look at output from multiple monitors. The coloring helps readability too:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
4 1 92 3 0 0| 56k 84k| 0 0 | 94B 188B|1264 1369
3 7 43 44 1 1| 368k 11M| 151B 222B| 0 260k|1453 1565
3 2 46 48 1 0| 432k 5784k| 0 0 | 0 0 |1421 1584
2 2 47 49 0 0| 592k 0 | 0 0 | 0 0 |1513 1763
6 2 44 49 1 0| 448k 248k| 0 0 | 0 0 |1398 1640
8 4 41 45 3 0| 456k 0 | 135B 222B| 0 0 |1530 2102
18 4 38 41 0 0| 408k 128k| 0 47B| 0 0 |1261 1977
10 4 44 43 0 0| 728k 208k| 0 0 | 0 0 |1445 2203
6 3 39 51 0 0| 648k 256k|3607B 4124B| 0 0 |1496 2180
7 7 34 53 0 0|1088k 0 |1234B 582B| 0 0 |1465 2057
14 8 28 49 0 0|2856k 104k| 0 0 | 0 52k|1610 2995
6 6 43 45 0 0|1992k 0 |5964B 4836B| 0 0 |1493 2391
9 14 34 44 0 0|2432k 112k|7854B 726B| 0 0 |1527 2190
9 11 40 41 1 0|2680k 0 |1382B 972B| 0 0 |1550 2298
5 4 68 22 0 0| 576k 1096k| 12k 4628B| 0 0 |1522 1731 ^C
(Textual screenshot by script of util-linux and Perl module HTML::FromANSI.)
Its default one-line-per-timeslice output makes it good for collecting data samples over time, as opposed to full-screen top-like utilities such as atop, which give much more detailed information at each snapshot, but don't show history.
Since dstat is a standard package available in RHEL/CentOS and Debian/Ubuntu, it is a reasonably easy add-on to get on various systems.
dstat also allows plugins, and just in the most recent release last month were added new plugins "for showing NTP time, power usage, fan speed, remaining battery time, memcache hits and misses, process count, top process total and average latency, top process total and average CPU timeslice, and per disk utilization rates."
It sounds like it'll grow even more useful over time and is worth keeping an eye on.
GNU Screen: follow the leader
First of all, if you're not using GNU Screen, start now :).
Years ago, Jon and I spoke of submitting patches to implement some form of "follow the leader" (like the children's game, but with a work-specific purpose) in GNU Screen. This was around the time he was patching screen to raise the hard-coded limit of windows allowed within a given session, which might give an idea of how much screen gets used around here (a lot).
The basic idea was that sometimes we just want to "watch" a co-worker's process as they're working on something within a shared screen session. Of course, they're going to be switching between screen windows and if they forget to announce "I've switched to screen 13!" on the phone, then one might quickly become lost. What if the cooperative work session doesn't include a phone call at all?
To the rescue, Screen within Screen.
Accidentally arriving at one screen session within another screen session is a pretty common "problem" for new screen users. However, creative use of two (or more) levels of nested screen during a shared session allows for a "poor man's" follow the leader.
If the escape sequence of the outermost screen is changed to something other than the default, then the default escape sequence will pass through and take effect on the inner screen. In this way, anyone attached to the outermost screen will be following whomever is controlling the inner screen session as they flip between windows, grep logs, launch editors and save my vegan bacon! To "break away" from the co-working session, a user would simply use the chosen non-default escape sequence of the outermost screen to create a new window or disconnect entirely.
Sound confusing? Give some of the following commands a try. You can always just close out all the windows of a screen session and eventually you'll make it back to your original shell.
Steps:
- start the outermost screen session (called "followme") with a non-default escape sequence (pick one that suits you):
screen -S followme -e ^ee
- from within the "followme" session, start the inner screen where actual work will be performed:
screen -S work
- get friends and co-workers (logged-in as the same user) to connect to your "followme" screen:
screen -x followme
- work as normal using the default: <CTRL> <a> sequences (which ought to affect the inner "work" session).
- to "break away" from the "work" session, use: <CTRL> <e> sequences (which ought to affect the outer "followme" session). For example, to disconnect from the shared session, one would type: <CTRL> <e> <d>
Note: If those sharing the screen session are already acclimated to screen-within-screen, you can skip the non-default escape sequences entirely and use <CTRL> <a> <a> as the escape sequence (another <a> for every level of screen-within-screen). This also happens to be your evasion route for accidental screen-within-screen moments.
Remember that, by default, everyone who wants to share the screen must already be logged-in as the same user (without the use of sudo or su). There are methods of allowing shared screen access between users, but those are outside the scope of this post.
Have fun!
Rejecting SSLv2 politely or brusquely
Once upon a time there were still people using browsers that only supported SSLv2. It's been a long time since those browsers were current, but when running an ecommerce site you typically want to support as many users as you possibly can, so you support old stuff much longer than most people still need it.
At least 4 years ago, people began to discuss disabling SSLv2 entirely due to fundamental security flaws. See the Debian and GnuTLS discussions, and this blog post about PCI's stance on SSLv2, for example.
To politely alert people using those older browsers, yet still refusing to transport confidential information over the insecure SSLv2 and with ciphers weaker than 128 bits, we used an Apache configuration such as this:
# Require SSLv3 or TLSv1 with at least 128-bit cipher
<Directory "/">
SSLRequireSSL
# Make an exception for the error document itself
SSLRequire (%{SSL_PROTOCOL} != "SSLv2" and %{SSL_CIPHER_USEKEYSIZE} >= 128) or %{REQUEST_URI} =~ m:^/errors/:
ErrorDocument 403 /errors/403-weak-ssl.html
</Directory>
That accepts their SSLv2 connection, but displays an error page explaining the problem and suggesting some links to free modern browsers they can upgrade to in order to use the secure part of the website in question.
Recently we've decided to drop that extra fuss and block SSLv2 entirely with Apache configuration such as this:
SSLProtocol all -SSLv2 SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP
The downside of that is that the SSL connection won't be allowed at all, and the browser doesn't give any indication of why or what the user should do. They would simply stare at a blank screen and presumably go away frustrated. Because of that we long considered the more polite handling shown above to be superior.
But recently, after having completely disabled SSLv2 on several sites we manage, we have gotten zero complaints from customers. Doing this also makes PCI and other security audits much simpler because SSLv2 and weak ciphers are simply not allowed at all and don't raise audit warnings.
So at long last I think we can consider SSLv2 dead, at least in our corner of the Internet!
MTU tweak: a fix for upload pain
While traveling and staying at Hostel Tyn in Prague's city center, I ran into a strange problem with my laptop on their wireless network.
When many people were using the network (either on the hostel's public computers or on the wireless network), sometimes things bogged down a bit. That wasn't a big deal and required merely a little patience.
But after a while I noticed that absolutely no "uploads" worked. Not via ssh, not via browser POST, nothing. They always hung. Even when only a file upload of 10 KB or so was involved. So I started to wonder what was going on.
As I considered trying some kind of rate limiting via iptables, I remembered somewhere hearing that occasionally you can run into mismatched MTU settings between the Ethernet LAN you're on and your operating system's network settings.
I checked my setup and saw something like this:
ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
inet addr:10.x.x.x Bcast:10.x.x.x Mask:255.255.255.0
inet6 addr: fe80::xxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1239 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:191529 (191.5 KB) TX bytes:4543 (4.5 KB)
The MTU 1500 stood out as being worthy of tweaking. So I tried a completely unscientific change:
sudo ifconfig wlan0 mtu 1400
Then tried the same HTTP POST that had been consistently failing, and poof! It worked fine. Every time.
I think mostly likely something more than 1400 bytes would've been possible, perhaps just a few short of 1500. The number 1492 rings familiar. I'll be old-fashioned and not look it up on the web. But this 1400-byte MTU worked fine and solved the problem. To my delight.
As an interesting aside, before making the change, I found one web application where uploads did work fine anyway: Google's Picasa. I'm not sure why, but maybe it sliced & diced the upload stream into smaller chunks on its own? A mystery for another day.
Operating system upgrades
This won't be earth-shattering news to anyone, I hope, but I'm pleased to report that two recent operating system upgrades went very well.
I upgraded a laptop from Ubuntu 8.10 to 9.04, and it's the smoothest I've ever had the process go. The only problem of any kind was that the package download process stalled on the last of 1700+ files downloaded, and I had to restart the upgrade, but all the cached files were still there and on reboot everything worked including my two-monitor setup, goofy laptop audio chipset, wireless networking, crypto filesystem, and everything else.
I also upgraded an OpenBSD 4.3 server that is a firewall, NAT router, DHCP server, and DNS server, to OpenBSD 4.5. It was the first time I used the in-place upgrade with no special boot media and fetching packages over the network, as per the bsd.rd instructions, and it went fine. Then the extra packages that were there before had to be upgraded separately as per the FAQ on pkg updates. I initially scripted some munging of pkg_info's output, not realizing I could simply run pkg_add -u and it updates all packages.
There was one hangup upgrading zsh, which I just removed and reinstalled. Everything else went fine, and all services worked fine after reboot.
How pleasant.
TLS Server Name Indication
I came across a few discussions of the TLS extension "Server Name Indication", which would allow hosting more than one "secure" (https) website per IP address/TCP port combination. The best summary of the state of things is (surprise, surprise) the Wikipedia Server Name Indication article. There are more details about client and server software support for SNI in Zachary Schneider's blog post and Daniel Lange's blog post.
I don't recall hearing about this before, but if I did I probably dismissed as being irrelevant at the time because there would've been almost no support in either clients or servers. But now that all major browsers on all operating systems support SNI except some on Windows XP it may be worth keeping an eye on this.
Yes, IE on Windows XP is still a huge contingent and thus a huge hurdle. But maybe Microsoft will backport SNI support to XP. Even if just for IE 7 and later. Or maybe we'll have to wait a few more years till the next Windows operating system (hopefully) displaces XP. Here's a case where the low popularity of Vista (which supports SNI) is hurting the rest of us.
I'm really looking forward to the flexibility of name-based virtual hosting for https that we've had for 10+ years with plain http. It could really change the setup and ongoing infrastructure costs for secure websites, such as ecommerce sites.
Passenger and SELinux
We recently ran into an issue when launching a client's site using Phusion Passenger where it would not function with SELinux enabled. It ended up being an issue with Apache having the ability to read/write the Passenger sockets. In researching the issue we found another engineer had reported the problem and there was discussion about having the ability to configure where the sockets could be placed. This solution would allow someone to place the sockets in a directory other than /tmp and set the context on the directory so that sockets created within it have the same context and then grant httpd the ability to read/write to sockets with that specific context. This is a win over granting httpd the ability to read/write to all sockets in /tmp since many other services place their sockets there and you may not want httpd to be able to read/write to those sockets.
End Point had planned to take on the task of patching passenger and submitting the patch. While collecting information about the issue this morning to pass to Max I found this in the issue tracker for Passenger:
Comment 4 by honglilai, Feb 21, 2009 Implemented. Status: Fixed Labels: Milestone-2.1.0
Excellent! We'll be testing this internally soon and will post a new blog entry with our solution for Passenger + SELinux. Thanks to the Passenger engineers for taking the request seriously and working on an update with the PassengerTempDir configuration directive included.
Slow Xen virtualization of RHEL 3 i386 guest on RHEL 5 x86_64
It seems somehow appropriate that this post so closely follows Ethan's recent note about patches vs. complaints in free software. Here's the situation and the complaint (no patch, I'm sorry to say):
We're migrating an old server into a virtual machine on a new server, because our client needs to get rid of the old server very soon. Then afterwards we will migrate the services piecemeal to run natively on RHEL 5 x86_64 with current versions of each piece of the software stack, so we have time to test compatibility and make adjustments without being in a big hurry.
The old server is running RHEL 3 i386 on 2 Xeon @ 2.8 GHz CPUs (hyperthreaded), 4 GB RAM, 2 SCSI hard disks in RAID 1 on MegaRAID, running Red Hat's old 2.4.21-4.0.1.ELsmp kernel.
The new server is running RHEL 5 x86_64 on 2 Xeon quad-core L5410 @ 2.33GHz CPUs, 16 GB RAM, 6 SAS hard disks in RAID 10 on LSI MegaRAID, running Red Hat's recent 2.6.18-92.1.22.el5xen kernel.
The virtual machine is using Xen full virtualization, with 4 virtual CPUs and 4 GB RAM allocated, with a nearly identical copy of the operating system and applications from the old server. And it is bog-slow. Agonizingly slow.
Under the load of even a single repeated web request to web server (Apache) + app server (Interchange) + database server (MySQL), it breathes heavy, and takes 1-2 seconds per request (wildly varying). The old physical machine takes 0.5-0.7 seconds per request under 2 concurrent users. Under heavier load (just a boring day of regular web traffic) the new VM groans and plods along.
The most noticeable metric is that the CPUs get pegged from 50%-90% to more system usage, with under 40% user usage. This is nearly the opposite of the physical machine where system usage was always in the low teens %, and user usage was around 50% per CPU. In both cases there's almost no I/O wait.
First, I'm really surprised it's this bad. We've done Xen full virtualization of RHEL 5 x86_64 and i386 guests on RHEL 5 x86_64, with no special handling, and it's always worked quite well with little performance degradation.
So, we know there are paravirtualized drivers you can use to speed up network and disk devices even of otherwise fully virtualized guests. However, apparently you can't use the paravirtualized drivers in 32-bit RHEL 3 guest on a 64-bit RHEL 5 host. That's really painful, since in my mind a very common use case for virtualization is loading a bunch of old 32-bit machines on a big 64-bit machine with a lot of RAM. But ... not if you want it to even match the speed of the old servers!
We increased the number of virtual CPUs to 8. That took the edge off the worst slowdowns a bit, but only barely.
We tried upgrading the RHEL 3 guest to the very latest versions of everything from Red Hat Network (Update 9 IIRC) and upgrading the RHEL 5 host to the very latest RHEL 5.3, and saw the wild variation in performance from request to request moderate a lot. Also, performance under heavier concurrency was stable: 2.4-2.6 seconds per request in that scenario.
But that's still really slow. I hope we're just missing something obvious here. I'd love to know what the really stupid mistake we're making is. So far, the search has been fruitless and this seemingly ideal use can for Xen virtualization is barely usable.
Machine virtualization on the Linux desktop
In the past I've used virtualization mostly in server environments: Xen as a sysadmin, and VMware and Virtuozzo as a user. They have worked well enough. When there've been problems they've mostly been traceable to network configuration trouble.
Lately I've been playing with virtualization on the desktop, specifically on Ubuntu desktops, using Xen, kvm, and VirtualBox. Here are a few notes.
Xen: Requires hardware virtualization support for full virtualization, and paravirtualization is of course only for certain types of guests. It feels a little heavier on resource usage, but I haven't tried to move beyond lame anecdote to confirm that.
kvm: Rumored to have been not ready for prime time, but when used from libvirt with virt-manager, has been very nice for me. It requires hardware virtualization support. One major problem in kvm on Ubuntu 8.04 is with the CD/DVD driver when using RHEL/CentOS guests. To work around that, I used the net install and it worked fine.
VirtualBox: This was for me the simplest of all for desktop stuff. I've used both the OSE (Open Source Edition) in Ubuntu and Sun's cost-free but proprietary package on Windows Vista. The current release of VirtualBox only emulates i386 32-bit machines at the moment, though! (No 64-bit guests, though a 64-bit host is fine.) It's also been a little buggy at times -- I've had a few machine crashes when running both an OpenBSD 4.3 and a RHEL 5 guest, though I wasn't able to reproduce the problem and it's possible it wasn't a VirtualBox issue.
I should note that some manufacturers have a BIOS option to disable hardware virtualization, and that it is sometimes disabled by default. When booting a new machine, check for that, especially in servers you won't necessarily want to take down later.
A final note about RHEL 5's net install: Why, oh why, does the installer ask for an HTTP install location as separate web site and directory entries, instead of a universally used and easy URL? And further, when the install source I'm using goes down (as download mirrors occasionally do), why are my only options to reboot or retry? Would it have been so hard to allow me the option of entering a new download URL? Yes, I know, I need to send in a patch.
nginx and lighttpd deployments growing
Apache httpd is great. But it's good to see Netcraft report that nginx and lighttpd continue to grow in popularity as well. Having active competition in the free software web server space is really beneficial to everyone, and these very lightweight and fast servers fill an important niche for dedicated static file serving, homegrown CDNs, etc. Thanks to all the developers involved!
