Apache-Hadoop-Developer Free Dumps Study Materials
Question 2: You are developing a MapReduce job for sales reporting. The mapper will process input keys
representing the year (IntWritable) and input values representing product indentifies (Text).
Indentify what determines the data types used by the Mapper for a given job.
A. The key and value types specified in the JobConf.setMapInputKeyClass and
JobConf.setMapInputValuesClass methods
B. The data types specified in HADOOP_MAP_DATATYPES environment variable
C. The mapper-specification.xml file submitted with the job determine the mapper's input key and
value types.
D. The InputFormat used by the job determines the mapper's input key and value types.
Correct Answer: D
Explanation:
The input types fed to the mapper are controlled by the InputFormat used.
The default input format, "TextInputFormat," will load data in as (LongWritable, Text) pairs.
The long value is the byte offset of the line in the file. The Text object holds the string
contents of the line of the file.
Note: The data types emitted by the reducer are identified by setOutputKeyClass()
andsetOutputValueClass(). The data types emitted by the reducer are identified by
setOutputKeyClass() and setOutputValueClass().
By default, it is assumed that these are the output types of the mapper as well. If this is not
the case, the methods setMapOutputKeyClass() and setMapOutputValueClass() methods
of the JobConf class will override these.
Reference: Yahoo! Hadoop Tutorial, THE DRIVER METHOD