Apache Hadoop

Etablishing a precise map of the target infrastructure is crucial as Hadoop environments expose a lot of services. The main goal is to get to know:

  • Which server holds which role: datanode, namenode and edgenode

  • Which technologies and which third-party modules are deployed: for instance Apache HBase, Apache Hive, Apache Spark, Apache Kafka, Cloudera HUE, Apache Ranger, etc.

A good way to easily map the infrastructure is to find the WebUI:

  • HDFS NameNode WebUI, on port HTTP/50070 or HTTPS/50470

  • HDFS DataNode WebUI, on port HTTP/50075 or HTTPS/50475

  • Secondary NameNode WebUI, on port HTTP/50090

  • YARN ResourceManager WebUI, on port HTTP/8088 or HTTPS/8090

  • YARN NodeManager WebUI, on port HTTP/8042 or HTTPS/8044

  • MapReduce v2 JobHistory Server WebUI, on port HTTP/19888 or HTTPS/19890

  • MapReduce v1 JobTracker WebUI, on port HTTP/50030

  • MapReduce v1 TaskTracker WebUI, on port HTTP/50060

Last updated