Spark is a in-mem platform for fast compute (ebay uses Spark); Hazelcast is data grid that is more about storage than computing; Cascading is a tool for building data processing pipline.
Spark set up script for AWS EMR from s3 provided by amazon http://shrub.appspot.com/elasticmapreduce/samples/spark/1.0.0/
Best Practice for Testing Spark
Report Analysis Using Spark: Spark Report Patterns
Have you tried Spark API yet, I mean all of them? Examples for all the API functions.
Experience with Spark: The author has extensive hands on experiences with Spark. It seems he has encountered many issues. The one of the comments is from a Spark contributor, who points readers to some ways of improving app performance when using Spark.
AWS and Hadoop related
hdfs or hadoop fs is a client side tool to work with hadoop cluster. It requires proper config. hdfs user is the super user.