Apache Hadoop
Etablishing a precise map of the target infrastructure is crucial as Hadoop environments expose a lot of services. The main goal is to get to know:
Which server holds which role: datanode, namenode and edgenode
Which technologies and which third-party modules are deployed: for instance Apache HBase, Apache Hive, Apache Spark, Apache Kafka, Cloudera HUE, Apache Ranger, etc.
A lot of modules along with their purpose are reported on The Hadoop Ecosystem Table
A good way to easily map the infrastructure is to find the WebUI:
HDFS NameNode WebUI, on port HTTP/50070 or HTTPS/50470
HDFS DataNode WebUI, on port HTTP/50075 or HTTPS/50475
Secondary NameNode WebUI, on port HTTP/50090
YARN ResourceManager WebUI, on port HTTP/8088 or HTTPS/8090
YARN NodeManager WebUI, on port HTTP/8042 or HTTPS/8044
MapReduce v2 JobHistory Server WebUI, on port HTTP/19888 or HTTPS/19890
MapReduce v1 JobTracker WebUI, on port HTTP/50030
MapReduce v1 TaskTracker WebUI, on port HTTP/50060
Last updated