jesscoburn.com

Tidbits and thoughts on webhosting, web applications and just general cool geek crap.


A follow up to Smartertools answers the cry on the fight against spam with smartermail 4.0.

Alot of clients have been asking about how we’re handling spamassassin with Smartermail 4.0.  It’s no secret that spamassassin on a windows server runs horribly slow. If more than a handful of domains are involved I have no doubt that spamassassin would cripple the server if not fail completely.  However I also believe that greylisting is the more effective component in the smartertools anti-spam arsenal and will reduce spam to a fraction of what it would be with just spamassassin alone.

So there’s a ton of interest in farming out spamassassin to a Linux vps. Why, you ask?  Well quite simply spamassassin runs like a mad cow on steroids on a Linux server. Okay maybe I’m exaggerating but it’s a ton faster. Plus as hard as it is to admit it, being a die hard windows geek, it was developed on Linux and the community support for is still very much linux so it just runs better.  Fortunately, smartertools (under the leadership of Tim Uzzanti, formerly of Crystaltech and my two superhero-style developer home-boys Grady W and Bryon G) saw ahead and knew this could be a problem. What did they do? They devised smartermail to support not only a remote spamassassin processing server on linux but if need be a farm of spamassassin processing servers. By going with a linux install of spamassassin you’ll gain the added support of the spamassassin community (also linux geeks er um developers .. ehh linux developer, geek … same thing ;) ).

What’s so great about Spamassassin on Linux?

Out of the box spamassassin isn’t very effective. Okay, it’s good but not nearly as good as it should be. To really take advantage of spamassassin you’ll want to add a few functions:

  • DCC, DCC is the Distributed Checksum Clearinghouse. Basically your server creates a checksum from messages you receive compares this checksum to a distributed database of checksums to decide if the message is spam or not and then scores it accordingly. Basically you and a bunch of other mail server operators are teaming together to create a distributed, constantly updated database of spam and non-spam messages. Very cool.
  • Vipul’s Razor, is similar to DCC but uses the Cloudmark Spamnet network (my understanding is it’s the same database that backs their commercial services).
  • Pyzor, Similar to Razor, Pyzor is a completely free database and client written in .. you guessed .. python. It was developed out of fear that the Razor database being commercial may be ripped away from the opensource community at some point.

Now, these three tools will slow down your message processing (around 2-10 seconds generally and you should set a timeout so that they don’t hold up email too long) but they really add some power behind Spamassassin.

You now have evolved from the rules only processing of spamassassin into a rules processing system combined with a series of independent distributed message clearinghouses. I should note that if you have any volume whatsoever DCC is going to want you to setup your own DCCD (which we have setup currently but are still beta testing smartermail 4.0 before rolling out completely).

Why Rules? Don’t the Spammers Know These Rules too?

So now you have the default rules (around 91 I believe) and the clearinghouses. But what good are the rules right? I mean afterall if I have them the spammers have them too. Now enter the SpamAssassin Rules Emporium (SARE) a series of frequently updating rules that you can download at various times updating your rules using a tool like sa-update. This means your rules are constantly evolving just like the spammers are.  Now we got kerosene on the fire. We have a set of consistently changing rules (which you’ll want to pick from carefully remember these could be touchy and some rules may flag good mail as bad) and a series of Independent distributed message clearinghouses. 

A note about rules from SARE: There are different levels of rules, some that when tested against a mail test database picked up only spam messages but not all of the spam messages, some that picked up more spam messages but flagged a few good emails as spam too and finally some that picked up all the spam messages but flagged more ham as spam. It’s really up to you to decide what’s safe and what’s now.

Which rules do you deploy? Our own testing has shown that greylisting filters 90% of the spam and that spamassassin does a good job of flagging almost all of those that get through greylisting with just the safe level of rules employed. We have about 501 tests we run each message through currently and it takes between 1.2 and 5 seconds without the distributed database checks, with the database checks it takes 1.2 seconds to 20 seconds. Now our system hasn’t been fully optimized and tweaked yet but it’s getting there.

Rules and DCC what else does Spamassassin Give me?

So now we have a constantly updating database of rules, a way to compare our messages to a distributed database of email signatures to see if others have flagged them as spam and… here’s the coolest part. You know those annoying image emails you get selling viagra or stocks? That you can’t for the life of you figure out how to filter? Well spamassassin has OCR (object character recognition) plugins available that will read these messages and then review the text to see if it’s truly spam. This is VERY cool!  But as the cat and mouse game goes, have you noticed that your image spam is becoming colorful now? Strange backgrounds? Multi-colored text? You know all those tricks we perform with CAPTCHA to keep bots from registering on our forms? Yeah the spammers are using those techniques in spam messages now (the rat bast*rds).

The Spam Fighting Duo becomes a powerful Dynamic Trio!

Spamassassin is very cool and Smartermail has gotten even cooler. Now enters the final member of our Team of Superhero Techno-tools, SWSoft’s Virtuozzo.  Virtuozzo is a OS virtualization VPS engine. What’s this mean? Hardware virtualization systems like Microsoft Virtual Server and VMWare have a overhead (reported on the order of 20%) due to virtualizing the hardware. This means 4 VPSs on a single server will only deliver the processing power of the single box at 80%. With hardware virtualization you gain a great deal of flexibility in being able to run mixed guest operating systems on a host system (IE, running Linux and Windows VPS’s on a Windows Host machine) but you pay for that with a performance loss (most argue with today’s processing power it’s an acceptable loss but you decide for yourself).

With OS virtualization you are still very much virtualized but you run the same Guest OS as the Host OS so you can’t run Linux on windows. But guess what? You aren’t getting bottlenecked as you are in HW virtualization.  Now Virtuozzo gets even cooler. You get all the raw power, plus now that you’re using the same OS at the Host and across all of your guest OS’s they can actually share common memory and diskspace. So the 2GB of diskspace you’d normally lose in a 10GB VPS partition isn’t lost at all. You only give up any diskspace for files that differ from the host machine’s version (for instance if you created your own bind binary it and it’s necessary libraries would be unique to your vps and use your diskspace and memory allotment of your VPS servers) I believe this is around 100 to 200MB on average.

Next you get something called Virtuozzo templates. These are ready made application, operating system and in some cases full VPS machine templates that are shared across multiple VPS virtual engines (VE’s or VPSs if you will). So now you can have a series of very similar VEs (vps’s) running on a single hardware node all sharing resources. This means although your apps and virtual machine is very much separated and secure you’re not running all of the overhead of the guest operating system on your virtual machine and you’ll gain performance over a HW virtualized system. Our own informal testing showed this to be a great benefit and very much worth the tradeoffs between HW and OS virtualization for a hosted application and webhosting platform. 

So why Virtuozzo for our spamassassin VEs?

  • The performance difference between HW virtualization and OS virtualization. HW virtualization is great, adds alot of functionality that you may or may not need and will get the job done but OS virtualization is the only way to go in a production hosting environment that demands maximum performance, reliability and scalability.
  • Shared OS resources reducing the need for redundant processes and diskspace waste. Allowing for more VPSs per HW node and thus lower cost.
  • The ability to create templates of a working VPS design and then replicate it across hundreds of VPS’s within a matter of minutes (I didn’t really get into that but it’s extremely cool)
  • The ability to patch a single VPS and then create a template for this patch and replicate it automatically across all VPSes.
  • The ability to move a VPS from one HW node to another HW node with near zero downtime (again extremely cool)
  • Finally, it’s a platform we’ve already adopted and have been using for about 3 years now and are extremely familiar with it and find it quite popular in the hosting industry.

I know there’s already been a ton of work on a VMWare image in the smartertools community and this is without question trail blazing efforts. For many servers the ready built solution is a clear winner. I mean afterall how many admins are going to have a Virtuozzo Linux HW node sitting around? Please don’t think I’m downplaying this solution or the great benefit this donation to the community has been, it’s a very very clever solution.  But I honestly believe the more practical solution is a dedicated Linux VPS. Under high loads any mail server is going to slow down and require maximum disk I/O. Dedicated some of this disk I/O to a VPS engine on the same machine (using HW virtualization no less) is going to come at a cost and potentially not provide the performance required.

Side Note: Early on our shared mail servers were using SATA raid arrays.  SATA drive I/O is known to burst to SCSI levels but won’t sustain those levels. As a result we had no choice but to move from SATA to SCSI and that was the only difference between the two configurations. Disk I/O is king in a mail server and fast drives and plenty of them in a RAID array is the only way to go for a mail server. Giving up some of this disk I/O to a collocated VPS scares me in our own environment. Your environment is probably much different and may or may not have the same issue but that’s for you to decide.

We’re creating these VPS engines so that we can offer not only a farm of Spamassassin servers for our shared hosting mail servers that we’re able to dynamically add additional nodes to quickly, but provide dedicated managed Spamassassin VPSs to our dedicated hosting clients and potentially mailserver admins worldwide regardless of where their mail servers reside.

Think about it, a plug and play spam fighting solution. This may not be an original Applied Innovations “Innovation” (that distinction goes to: someone_else )but it’s definitely one we’ve taken to the next level and that my friend is just why our company is named Applied Innovations, it’s not just a name, it’s what we do.

 

The Applied Innovations Spamassassin VPS solution is currently available in beta mode. It will be fully available following the completion of our beta testing. If you’re an Applied Innovations dedicated hosting client and need a spamassassin managed VPS online today, let us know and we’ll quote you a price.


I recently installed the Wordpress Category Visibility Plug-in which allows you to select categories you don’t want to show up in different places on your blog. I have delicious set to upload my delicious links every day to a special category and I removed it from the frontpage to keep it from junking up my blog.  I also set the default category so that all of these entries would be entered in their own category and this is the category I don’t display on the homepage.  Well, shortly after this change I updated my blog template and when I went to update my Windows Live Writer so it would show new posts using the new template it failed, and failed and failed. I really couldn’t figure it out.

Well today I went to check my delicious links and found all the temp posts that WLW uses when it’s trying to determine your blog’s style (very cool actually). Turns out the problem was that these posts weren’t displaying on the homepage but instead in my delicious category and also weren’t getting deleted (I have like 50 temp posts in there). I turned the default category back to “Uncategorized” and set it to display on the homepage and presto! WLW works again. I suspect I’m not the only one having this issue so I hope this helps someone else.

SQL Injection Attacks


If you write any kind of script on the Internet be it ASP, ASP.net, PHP, PERL, Ruby, Python, anything that accesses a database then you should be aware of SQL Injection attacks.

This posting is going to reference two other blogs, one is the great Scott Guthrie’s blog (best damn blog on ASP.net on the Internet) and his post on Guarding Against SQL Injection attacks.

The second blog we’ll reference is Scott’s inspiration for his blog article, Michael Suttons blog and his work to see just how bad SQL injection is on the Internet. Michael did a quick google search and sampled something like 1000 websites and found that 11% of them were vulnerable to SQL injection.

Both blogs do an excellent job detailing SQL injection and providing links and references on how to fix your code and where to get more information on good coding security.

My addition to all this is that I’m going to add Secunia.com. Secunia.com provides a database of open and closed vulnerabilities for various applications and operating systems. Everything from Cisco to Windows is included here. 

I get a constant stream of email updates from secunia.com and each day I get atleast one email with either a SQL Injection or Cross Site Scripting vulnerability being listed so I know firsthand just how widespread the problem really is. I did a quick search on their database for SQL Injection and it found 1288 applications that either had or have a SQL injection vulnerability.  Folks, SQL Injection is a huge issue. 

If you’re going to purchase a web application or install any sort of web application (PHPBB, OSCommerce, Storefront, aspdotnetstorefront, you name it) I recommend you search Secunia’s database first.


I installed a new theme last night at home and was really happy with it. The new theme is called Lush and was originally created by: Marco van Hylckama Vlieg. (say that 5 times real fast, I dare you!) It was ported to Wordpress by: Christoph Boecken. I like the theme because it’s got the ultra cool (and useful too) AJAX powered live search at the top, the ability to adjust font sizes from within the homepage (helps for those late at night reads) built in support for widgets and support for some of my more favorite plugins without any modifications from me. The theme is still very new and quickly becoming a huge favorite with the wordpress community. You can find it here: WordPress Theme: Lush.

After uploading the new theme last night and dinking around with it a bit, I was all happy and proud and went to bed excited to show my friends and coworkers in the morning my hot new look. Well there was a problem the text was all jagged and busted. I spent a fair bit of time making sure my drivers were updated on my video card and monitor and it wasn’t any settings in my desktop. I even used Microsoft’s Online ClearType Optimizer (a very cool tool you should check out) to make sure my cleartype settings were tweaked, they were. 

Turns out it was the Font!  Below is what my jaggies looked like:

untitled picture

within the CSS file for the theme the font-family was set and the first 3 entries were: “Lucida Grande”,Lucida,”Lucida Sans”. Turns out I’m missing “Lucida Grande” in my family of fonts. I’m a big fan of Tahoma (which was next on the list so I moved it to the front and suddenly no more Jaggies! Hooray!

But I was still missing Lucida Grande and actually enjoyed it more than Tahoma. So I did a quick google for the font, found it host at gordeonbleu.  I downloaded it from there, extracted to my c:\windows\fonts folder and reloaded my page… No more jaggies HOORAY! and I moved the other Lucida fonts back further in the line of fonts so it would be less likely anyone else would see a busted page.

So if your wordpress site looks jaggied, check your fonts and consider changing to a more common font family.


 

Tutorial on creating your first Wordpress Plug-in.  Mark has created a fantastic video tutorial on creating your first wordpress plug-in. Definitely worth checking out.