Note: 以下操作均在Ambari管理的Hadoop集群

1. 进入到 MAPREDUCE_HOME

$ cd /usr/hdp/2.3.4.0-3485/hadoop-mapreduce

我们需要的jar包是:hadoop-mapreduce-client-jobclient-tests.jar

如果我们不带任何参数直接运行该jar包会列出所有的测试程序(Hadoop是on Yarn的):

[hdfs@base1 hadoop-mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-tests.jar 
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  NNdataGenerator: Generate the data to be used by NNloadGenerator
  NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
  NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
  NNstructureGenerator: Generate the structure to be used by NNdataGenerator
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

2. TestDFSIO

TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。TestDFSIO的用法如下:

[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO
17/01/11 14:23:22 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

写数据

向HDFS中写入10个1G的文件

[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

测试的结果会写入到当前目录下的TestDFSIO_results.log中

17/01/11 14:50:42 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/01/11 14:50:42 INFO fs.TestDFSIO:            Date & time: Wed Jan 11 14:50:42 CST 2017
17/01/11 14:50:42 INFO fs.TestDFSIO:        Number of files: 10
17/01/11 14:50:42 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
17/01/11 14:50:42 INFO fs.TestDFSIO:      Throughput mb/sec: 35.80251333643622
17/01/11 14:50:42 INFO fs.TestDFSIO: Average IO rate mb/sec: 43.56146240234375
17/01/11 14:50:42 INFO fs.TestDFSIO:  IO rate std deviation: 21.150555899398174
17/01/11 14:50:42 INFO fs.TestDFSIO:     Test exec time sec: 42.347
17/01/11 14:50:42 INFO fs.TestDFSIO:

读数据

从HDFS中读取10个1G的文件

[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

测试的结果会写入到当前目录下的TestDFSIO_results.log中

17/01/11 15:04:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/01/11 15:04:38 INFO fs.TestDFSIO:            Date & time: Wed Jan 11 15:04:38 CST 2017
17/01/11 15:04:38 INFO fs.TestDFSIO:        Number of files: 10
17/01/11 15:04:38 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
17/01/11 15:04:38 INFO fs.TestDFSIO:      Throughput mb/sec: 681.1989100817439
17/01/11 15:04:38 INFO fs.TestDFSIO: Average IO rate mb/sec: 707.2659301757812
17/01/11 15:04:38 INFO fs.TestDFSIO:  IO rate std deviation: 137.7001870463033
17/01/11 15:04:38 INFO fs.TestDFSIO:     Test exec time sec: 20.445
17/01/11 15:04:38 INFO fs.TestDFSIO:

清理测试数据

[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -clean

results matching ""

    No results matching ""