Disk Soup: AWS, EBS, RAID, MarkLogic, and Pinch of Salt!
At the 2013 MarkLogic User Conference , I learned all kinds of interesting and valuable information about running MarkLogic on AWS (Amazon Web Services) EC2 servers. Most specifically, it was mentioned that I wasn't necessarily going to get a huge performance gain over regular EBS storage via the RAID 10 configuration that I cooked up. That was good news to me because it costs me quite a bit to have all that extra EBS storage for RAID10.
I just finally got around to testing all of this out with live data. I trimmed down my data, merged all my forests, and cleaned up the disk to ensure I knew exactly how much storage I needed. I finally got it all down to about 148GB of on-disk data for about 3+ months of weather data.
My current configuration is eight 200GB volumes arranged in a RAID 10 configuration. That is 1.6TB of storage that yields about 750GB of usable disk space.
To consolidate this onto one volume, I created a 600GB EBS volume, created an ext4 filesystem, and copied all the data across while everything was shutdown. And then I waited, and waited, copying is sure slow, and waited...
When I was finally ready, I started up MarkLogic and all my Web applications to test the throughput. The result: it was twice as slow! I get at least a 2 times increase in performance by having RAID10 via mdadm.
Fortunately, the data hadn't changed and so I could easily switch back to the old filesystem. I restarted MarkLogic and verified my measurements: yes, RAID10/mdadm is better by at least twice.
I then looked into Provisioned IOPS and whether I could test that. Unfortunately, it isn't available for the instance type I'm using ( m2.xlarge ) and I would have to move to the next level up (m2.2xlarge). The additional cost of Provisioned IOPS for EBS and the m2.2xlarge removes any cost savings I might have had.
Here's the takeaway:
- RAID10 via mdadm is a good middle ground for AWS. It will give you better performance, possibly twice as fast as regular EBS storage.
- RAID10 will cost you less for overall EBS storage than Provisioned IOPS.
- Provisioned IOPS will give you better performance guarantees and you may find you want/need to pay for that.
- I don't have a measurement of that as of yet.
I wish I had an easy way to test out Provisioned IOPS for EBS storage with my system. It would be great to compare everything all at once. Unfortunately, I would first have to upgrade to a different instance type and then re-run all the tests I've done so far. For my current work, that isn't necessary.
Yet, when I want more performance, I now know what to do next.