Clouds to Trenches
Contents
Cloud computing
See our previous talk: http://book.xen.prgmr.com/mediawiki/index.php/Xen:_a_view_from_the_trenches
Quantity has a quality all its own
There is nothing new under the sun; everything that could be called 'cloud computing' is really just evolutionary advances on older technology, but bigger, stronger, and faster.
When someone speaks of 'cloud computing' they can mean several quite different things:
'application level cloud'
Google app engine is an example, as is engine yard and heroku of this type of cloud. This is quite similar in concept to old-style shared hosting... you upload an application in a particular format (usually a scripting language, in the case of old-style shared hosting, php) and then the hosting company bills you based on usage (it used to be billed on total bandwidth usage, but a 'per hit' charge is becoming more common)
you really do outsource your SysAdmin here. just like the old style shared hosting platforms, don't do anything unusual, and you should be fine. New 'per hit' billing accounts better for CPU and IO time, so these new systems tend to perform better than the (often massively oversubscribed) older shared-hosting setups.
fast provisioning
Then there is the 'cloud' as envisioned by amazon ec2 - sometimes referred to as 'utility computing' or 'grid computing' - This is quite a lot like an old-school pxeboot-with-systemimager setup, in that you can quickly spin up a new server. Now, amazon does save you (mostly) from mucking with hardware, and they do have a pretty nice programmatic interface to spin up and shut down nodes, but I think the biggest breakthrough is the 'rent servers by the hour' concept, which is really useful for some things (but not so useful for others)
both are super expensive compared to owning your own hardware.
real implementation underpinnings
fast provisioning
pay as you go billing
Traditionally, buying compute resources required committing for long periods of time. This was in part due to 'slow provisioning'
quote amazon paper here
Some terminology and notes
we're talking about xen because it's what we know -- other virtualization products generally have similar features. paravirtualization domain distinction between sysadmins and programmers so why do you care?
Current situation
very little automation disjoint billing and provisioning systems no provision for migration each customer has a fixed allocation on a particular server manual resource controls (still a useful vps service, but not "cloud") dedicated/vps/cloud dedicated servers can be just as cloud as amazon (apply the same model to hardware.)
Addressing each point
automation better scripting would solve this but we also want a self-service api need to have machines communicate available resources ip address allocation link billing and provisioning enables utility pricing also link billing to resource usage (disk, net) hardware dependence simple: each machine has a console whose 'location' can be updated through dns migration is harder need local storage for speed, reliability, cheapness fortunately xen offers migration hooks that can freeze the domain, migrate storage, and then move the domain. so, we do that cite oracle paper talk about opensolaris don't forget to update dns, notify billing machine. resource controls currently manual "Luke is a very good sysadmin, but that doesn't scale." cpu <bits from previous talk> memory fixed allocation -- not our problem balloon driver need for monitoring daemons that can automatically adjust overview of linux net qos <insert bits from old talk> "You can tell we've been learning the business as we go along" free month debacle overview of disk qos ionice map between dom0 processes and domU only addresses priorities -- arrange like cpu use.
pieces of prgmr.com api
"the more centalized you make the system, the harder you need to fortify it." two separate machines reduces odds of compromise successful attack against either won't result in immediate disruption (still a serious problem, of course.)
machine a handles kernel/image selection
machine b handles rebooter
auth via x.509 certs avoid handling credit cards, etc. at all costs. dreamhost fiasco avoid passwords -- too insecure. (most users pick bad passwords or reuse only a few.)
conclusion -- our plans
drive cost of computing down to minimum xenoservers -- machines join the network, paid for use "See, in the winter, you can turn on that old space heater you've got lying around, and get something back for it." (hey, i have a sun e4500 in my kitchen.) highly speculative -- vps business is okay too.
wins of virtualization
http://wiki.oracle.com/page/Oracle+VM+Live+Storage+Migration
"we are talking about cloud computing. some vapor is obligatory."