|
Session Title: Big Data
Convener: Eric Heydrick
Participants: Kyle Bader, Grant Rodgers, Kate Leroux, Nathaniel Eliot, Sean Horn
Summary of discussions:
We discussed challenges and solutions for managing big data clusters with Chef. Points covered included
- There are good cookbooks for managing Cassandra. https://github.com/riptano/chef/tree/master/cookbooks/cassandra is one of them
- Work on the voldemort cookbook will be shared.
- Cluster Chef has cookbooks for Hadoop but mostly specific to Cluster Chef
- Talked about storage on EC2. Some find ephemeral storage to work well. EBS + RAID0 also works well.
- EBS volumes can be attached w/ IAM restricted keys. Can also have the node make a call to a broker webservice that talks to the EC2 API.
- Cassandra Stress Test, part of the Cassandra distribution, can be used to benchmark performance
- Rhino is an ORM for hbase.
- Talked about monitoring. Ganglia in EC2 requires use of unicast and can be an issue when lots of nodes are in flux.
- Graphite + statsd is useful for trending. Also good reading: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
What will we do now? What needs to happen next?
- Investigate tools mentioned during the discussion
- Automate cluster deloyments
|
|