Note: 以下操作均在Ambari管理的Hadoop集群
1. 进入到 MAPREDUCE_HOME
$ cd /usr/hdp/2.3.4.0-3485/hadoop-mapreduce
我们需要的jar包是:hadoop-mapreduce-client-jobclient-tests.jar
如果我们不带任何参数直接运行该jar包会列出所有的测试程序(Hadoop是on Yarn的):
[hdfs@base1 hadoop-mapreduce]$ yarn jar hadoop-mapreduce-client-jobclient-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
2. TestDFSIO
TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。TestDFSIO的用法如下:
[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO
17/01/11 14:23:22 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
写数据
向HDFS中写入10个1G的文件
[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000
测试的结果会写入到当前目录下的TestDFSIO_results.log中
17/01/11 14:50:42 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
17/01/11 14:50:42 INFO fs.TestDFSIO: Date & time: Wed Jan 11 14:50:42 CST 2017
17/01/11 14:50:42 INFO fs.TestDFSIO: Number of files: 10
17/01/11 14:50:42 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
17/01/11 14:50:42 INFO fs.TestDFSIO: Throughput mb/sec: 35.80251333643622
17/01/11 14:50:42 INFO fs.TestDFSIO: Average IO rate mb/sec: 43.56146240234375
17/01/11 14:50:42 INFO fs.TestDFSIO: IO rate std deviation: 21.150555899398174
17/01/11 14:50:42 INFO fs.TestDFSIO: Test exec time sec: 42.347
17/01/11 14:50:42 INFO fs.TestDFSIO:
读数据
从HDFS中读取10个1G的文件
[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
测试的结果会写入到当前目录下的TestDFSIO_results.log中
17/01/11 15:04:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
17/01/11 15:04:38 INFO fs.TestDFSIO: Date & time: Wed Jan 11 15:04:38 CST 2017
17/01/11 15:04:38 INFO fs.TestDFSIO: Number of files: 10
17/01/11 15:04:38 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
17/01/11 15:04:38 INFO fs.TestDFSIO: Throughput mb/sec: 681.1989100817439
17/01/11 15:04:38 INFO fs.TestDFSIO: Average IO rate mb/sec: 707.2659301757812
17/01/11 15:04:38 INFO fs.TestDFSIO: IO rate std deviation: 137.7001870463033
17/01/11 15:04:38 INFO fs.TestDFSIO: Test exec time sec: 20.445
17/01/11 15:04:38 INFO fs.TestDFSIO:
清理测试数据
[hdfs@base1 ~]$ yarn jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -clean