SSH Tunnel on AWS : Using native Hadoop shell and UI on Amazon EMR

Socks Proxy is quite handy for browsing web content from EC2 instance and all the similar cloud machines. This is because hadoop, spark and many useful clustering solutions provide a portal page for users to monitor the cluster or jobs. The content is almost always only accessible from localhost.

For AWS EMR, to configure a socks proxy is as simple as it gets:

./elastic-mapreduce --describe -j <jobflow-id>

Then simply configure SOCKS proxy in your browser as 127.0.0.1 port 8157. FoxyProxy is such a plugin for browsers.

If vivid figures are needed,

YHemanth's Blog

Amazon’s Elastic MapReduce (EMR) is a popular Hadoop on the cloud service. Using EMR, users can provision a Hadoop cluster on Amazon AWS resources and run jobs on them. EMR defines an abstraction called the ‘jobflow’ to submit jobs to a provisioned cluster. A jobflow contains a set of ‘steps’. Each step runs a MapReduce job, a Hive script, a shell executable, and so on. Users can track the status of the jobs from the Amazon EMR console.

Users who have used a static Hadoop cluster are used to the Hadoop CLI for submitting jobs and also viewing the Hadoop JobTracker and NameNode user interfaces for tracking activity on the cluster. It is possible to use EMR in this mode, and is documented in the extensive EMR documentation. This blog collates the information for using these interfaces into one place for such a usage mode, along with some…

View original post 1,207 more words