In February of last year we were approached by Microsoft to try out this new managed hosting platform that was still in development. You see, prior to approaching us Microsoft had been working with another hoster (MaximumASP) and wanted to make sure this was a product that would have traction in the hosting industry.
We were uniquely positioned for this task as we had successfully built a new business unit in VPS hosting in 2007 and were well positioned to take that experience and expertise and apply it to this new platform.
This new platform, later to be known as the dynamic datacenter toolkit (or DDC as I like to refer to it), used Windows Server clustering service and Hyper-V to provide a highly available VPS platform. It then went a step further and integrated the System Center suite to provide the basic framework for a fully managed, highly available, cloud infrastructure. It provided basic guidance on sizing, security, performance and automation.
Our participation in this program did have one caveat, we’d have to build a client ready offering and have it in production in just two months. We did it within 30 days and in this post I hope to share what we learned in the process.
Our first cluster and how we built it.
This first cluster we built used Dell PowerEdge 2950 servers and a Dell MD3000i iSCSI SAN. We opted for this solution because it was at least $80,000 less than the alternatives being proposed. The solution is also profiled in a Dell Case Study on the Dell website. The great thing about using this SAN device, besides it being several thousands of dollars less than the other options, was that it was fully redundant and could scale up by just adding additional MD1000 storage arrays. This would allow us to scale out and grow as demand grew.
To get the service online quickly we deployed the system components all virtualized! This included:
- System Center Virtual Machine Manager – the brains behind the entire setup and provides centralized management interface for our time and is programmed against within the DDC to provision machines quickly.
- System Center Data Protection Manager 2007 – Provides very reliable backups with that click, click, go simplicity that we love about Microsoft products.
- System Center Operations Manager – Provides us very in depth, integrated monitoring and alerting.
- System Center Configuration Manager – for managing updates and mass configuration changes with easy. What used to require you to log into hundreds of machines one at a time or set complex group policies now just takes a few mouse clicks.
So we spun up a couple servers and split the above services across them. Then we spun up a couple more servers and connected all of those to our iSCSI SAN and had our new service online and ready to go. This new offering was really built as more of a proof of concept offering, we were unsure if the market would accept a premium VPS offering. We also wanted a solution that could scale out affordably from the initial configuration. The Dell solution had all of those features.
Our first cluster and what we learned.
As hosters we quickly realized that this new offering provided us several new benefits that weren’t available before:
- We were able to differentiate our offering from commodity VPS hosting because we offered a managed solution and the other hosters were focused (and still are) on being the lowest price in the market.
- We were able to offer functionality they couldn’t offer and thus compete on: fail-over clustering, an iSCSI storage architecture that was both fully redundant and multipath, and a level of management that couldn’t be beat.
- All of the APIs and guidance that we needed so we could build out a customer self-service portal if we opted to.
- Cloud! We had everything we needed to begin marketing a cloud offering and not just VPS hosting. Cloud = premium, VPS = commodity.
Of course, it wasn’t all sunshine and rainbows
I’d love to sit here and say it was all sunshine and rainbows. But we’re all in I.T. so we all know I.T. is ½ of S.H.*.*.. Here’s some of the challenges we ran into with this first cluster:
- The DDC relies heavily on Active Directory and if a client wants to run their own AD infrastructure it couldn’t be supported. Fortunately, there’s been work done that allows you to not have to rely on Active Directory now if you don’t want to and you can learn more about that on the DDC Dudes blog. That’s really the beauty of the DDC, it’s a framework and guidance that you take and build into something unique that’s yours! There’s also talk of Domain Federation being possible in the future so I think that will help as well.
- System Center Data Protection Manager 2007 – Backup is a pain. We all know it. The only people that think backup is set it and forget are the people that sell backup software! DPM was no exception. A single DPM server also had a limit on the number of servers it can backup. I believe that limit is around 350 (but am told it was increased to 400 with DPM 2010).
- We had to deploy one customer per LUN – Because our first build was based on Windows Server 2008 and not R2 we had to deploy our customers one per LUN. This posed a problem because in our industry, hosters exploit the fact that most customers don’t use all the resources they purchase and we’re able to oversubscribe such resources as bandwidth and storage. With Windows Server 2008 we weren’t able to do this, fortunately with Windows Server 2008 R2 and Cluster Shared Volumes (CSV) we were able to do this and have had great success thanks to CSV.
- We were limited to 65VMs per node in a cluster. – Although Hyper-V will support hundreds of servers on a single node, when clustered it would only support a maximum of 65. Recently this number was increased and with the same hardware we could potentially push the number up to 128 (Although we don’t and still keep it around 40 VM’s per node today).
- Hyper-V was still a 1.0 solution. – Let’s face it, we were on the cutting edge of technology and often found ourselves on the bleeding edge. But many of those pains went away with R2 and the rest we’ve been able to work around and Microsoft is quick to listen and help.
But we had great success with Hyper-V and that first cluster and realized our business focus would soon be shifting in this direction, we just didn’t realize how quickly.
I want 500 servers and I want them yesterday.
Towards the end of 2009 (just 9 months after deploying our first cluster on the Dynamic Datacenter Toolkit) we were presented with a unique RFQ. Our customer wanted us to bid on 500 servers, either 500 dedicated servers or 500 virtual servers and they were going to sign a 12 month contract for these servers. The specifications were pretty basic, 2.8GHz processor, 1GB of memory and 80GB of storage all running Windows Server 2008 R2 and they needed all of these servers online pretty much YESTERDAY!. We also knew we were up against other hosting companies and some of those would no doubt be dedicated hosting giants. Our senior management team quickly got together, got in touch with our vendors Dell for all of our servers and storage, Juniper for our switching and Terremark for our datacenter space and started crunching numbers.
What we soon realized was that to deploy 500 servers and remain competitive for this client we’d be deploying 500 commodity dedicated servers that in 12 months we’d be lucky if we were able to offer them at cost in the marketplace. Now the thought of deploying 500 physical servers was certainly intoxicating, we would have extended our datacenter footprint significantly after all.
But the reality of it was that it wouldn’t be profitable and we made a decision the year before to not be in the discount dedicated server business that it was a commodity business and not where we wanted to be positioned.
So we started crunching the numbers on our new DDC powered, Hyper-V cloud solution. We got with our vendors and started talking server hardware and storage, networking requirements, power, cooling, datacenter space and eventually came up with a very viable business plan and presented our proposal to our customer and .. we won it! Then we realized, Be careful with what you wish for!
It’s now the high point of ecommerce season, businesses are starting to go on vacation, build times are getting extended and delivery dates pushed back. But when it was all said and done, our vendors delivered on time, our team made some super-hero maneuvers and we were able to deploy this new solution within the time frame our customer requested and went live within just 3 weeks.
Let’s talk numbers and specifics
We decided to deploy these 500 virtual servers on top of 24 Dell PowerEdge R610s and using two Dell EqualLogic PS6500E storage arrays and a small army of Juniper switching gear. We’d deploy on top of Windows Server 2008 R2, utilize Cluster Shared Volumes and make sure everything was fully redundant and highly available.
Here’s some of the numbers we came up with when comparing the physical deployment against the virtual deployment:
- Hardware costs were 2 and ¼ times more expensive had we gone with physical servers over virtual.
- Had we deployed physical servers, our software costs would have been nearly 21 times more expensive to provide the same level of management and server monitoring. This did not include the operating system costs.
- Collocation costs (space) were almost 8 times more expensive for the physical deployment
- Power costs (datacenter/collocation power circuits) were nearly 19 times more expensive for the physical deployment
- Total costs over a 36 month period would have been north of 5 times more expensive had we gone with physical servers.
Great Job, How about 700 more servers?
Just as they were finishing up the case study in March our customer came back to us and asked us to do it all over again, this time with nearly 700 more servers. Since our previous deployment had now been online for about three months we were able to deploy this new batch of servers even earlier than we did the first round and had these new servers online within about two weeks and we were able to drive our costs even lower as we were able to achieve better efficiency than we did with our previous deployment.
It’s all about helping the customer and at the same time driving up profitability.
Well, that’s my story. That’s how we were able to leverage the Dynamic Datacenter Toolkit and deliver a solution that both meets the needs of our clients and at the same time helped us reduce our costs and increase our profitability per square foot.
I think it’s important to touch on that profitability per square foot. Today if you look at the collocation business you’ll see all of the major players becoming Managed Services provider. I think the reasoning behind that is that they’ve come to realize they can only make so much profit per square foot by offering just collocation so they are looking at how to increase their profitability and by offering managed services they’re able to increase it slightly. By offering managed cloud services though they are able to increase it significantly. I’d encourage you to go back and look at your costs and profit for one cabinet of equipment. Then take that same cabinet and add managed services on top of it and look at the profitability. Then finally, virtualize those servers and assume a ratio of maybe 40 to 1 for virtual machines to physical machines. I think you’ll soon find revenues increase significantly and hopefully you’ll realize just what Microsoft is trying to drive with the Dynamic Datacenter Toolkit.
Oh and BTW, how we won that first bid.
I do want to add one thing, I mentioned we won the bid but what I left out is why we won the bid. You’re probably thinking we won the bid because we offered the lowest price point because we went virtual. After all, everyone knows virtual is cheaper than physical.
Well we didn’t have the lowest bid and virtual isn’t always cheaper than physical. The reason we won the bid was that we offered a solution that our competition could not compete with. We were able to offer: a highly reliable, scalable, managed solution that the customer believed in and believed would meet their needs best.
So at the end of the day we won because we weren’t offering a commodity dedicated server service but rather a customer focused, managed cloud hosting solution, tailored to their needs.