Skip to end of metadata
Go to start of metadata


Session Title: Big Data

Convener: Eric Heydrick

Participants: Kyle Bader, Grant Rodgers, Kate Leroux, Nathaniel Eliot, Sean Horn

Summary of discussions:

We discussed challenges and solutions for managing big data clusters with Chef. Points covered included

  • There are good cookbooks for managing Cassandra. https://github.com/riptano/chef/tree/master/cookbooks/cassandra is one of them
  • Work on the voldemort cookbook will be shared.
  • Cluster Chef has cookbooks for Hadoop but mostly specific to Cluster Chef
  • Talked about storage on EC2. Some find ephemeral storage to work well. EBS + RAID0 also works well.
  • EBS volumes can be attached w/ IAM restricted keys. Can also have the node make a call to a broker webservice that talks to the EC2 API.
  • Cassandra Stress Test, part of the Cassandra distribution, can be used to benchmark performance
  • Rhino is an ORM for hbase.
  • Talked about monitoring. Ganglia in EC2 requires use of unicast and can be an issue when lots of nodes are in flux.
  • Graphite + statsd is useful for trending. Also good reading: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/

What will we do now? What needs to happen next?

  • Investigate tools mentioned during the discussion
  • Automate cluster deloyments



Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.