Eclipse Configuration


Steps toconfigure and Run WordCount Program

    Step:DownloadEclipse according to 32 bit or 64 bit.... DownloadEclipse

    Naming Convention
    Download Eclipse

    Step 1:Extract Eclipse and Click on eclipse icon

    Naming Convention
    Extracting zip file

    Naming Convention
    Extracting zip file

    Step 2: Create workspace in /home/use/workspace if want to change then give location or browse.

    Naming Convention
    Create workspace

    Step 3: Create project file->new->other->java->javaproject

    Naming Convention
    Create project

    Step 4: Give Project name WordCount

    Naming Convention
    WordCount project

    Step 5: change the default view (Project Explore to Navigator) Window->show view->navigator)

    Naming Convention
    Click on window

    Step 7 : Copy the Three jar file in lib folder and Create Three java class. Jar file name and location.

      (a) /hadoop-2.6.0/share/hadoop/common/lib : commons-cli-1.2.jar
      (b) /hadoop-2.6.0/share/hadoop/common : hadoop-common-2.6.0.jar
      (c) /hadoop-2.6.0/share/hadoop/mapreduce/ : hadoop-mapredure-client-core-2.6.0.jar
    Three java file for Drivercode,Mappercode,Reducer code.
      NewWordCount.java : main class
      NewWordMapper.java
      NewWordReducer.java

    Naming Convention
    Navigator

    Step 8: set java build path (class path) by Right Click on ->WordCountProject-> Properties->JavaBuildPath->Libaries->Click on Add jar and find three jar file in lib folder of WordCount project.

    Naming Convention
    java build path

NewWordCount.java

 
                
                 import org.apache.hadoop.fs.Path;
                 import org.apache.hadoop.io.IntWritable;
                 import org.apache.hadoop.io.Text;
                 import org.apache.hadoop.conf.Configuration;
                 import org.apache.hadoop.mapreduce.Job;
                 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
                 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
                 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
                 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
                 public classNewWordCount
                 {
                    /* ~J$ hadoop jar <jar file name > < Driver Code > < input file name > < output file name >
                    * ~J$ hadoop jar WordCount.jar NewWordCounti/p o/p
                    */
                public static void main(String[] args) throws Exception
                {
                //Creating an object of Configuration class, which loads the configuration parameters 
                Configuration conf = newConfiguration();
                //Creating the object of Job class and passing the confobject  and Job name as arguments. The Job class allows the user  to configure the job, submit it and control its execution.
                Job job = new Job(conf, "wordcount");
                //Setting the jar by finding where a given class came from
                job.setJarByClass(NewWordCount.class);
                //Setting the key class for job output data
                job.setOutputKeyClass(Text.class);
                //Setting the value class for job output data
                job.setOutputValueClass(IntWritable.class);
                //Setting the mapper for the job
                job.setMapperClass(NewWordMapper.class);
                //Setting the reducer for the job
                job.setReducerClass(NewWordReducer.class);
                //Setting the Input Format for the job
                job.setInputFormatClass(TextInputFormat.class);
                //Setting the Output Format for the job
                job.setOutputFormatClass(TextOutputFormat.class);
                //Adding a path which will act as a input for MR job. args[0]  means it will use the first argument written on terminal  as input path
                FileInputFormat.addInputPath(job, new Path(args[0]));
                //Setting the path to a directory where MR job will dump the  output. args[1] means it will use the second argument written on terminal as output path
                FileOutputFormat.setOutputPath(job,new Path(args[1]));
                //Submitting the job to the cluster and waiting for its completion
                job.waitForCompletion(true);
            }
        }                

NewWordMapper.java

 
                
                 import java.io.IOException;
                 import java.util.StringTokenizer;
                 import org.apache.hadoop.io.IntWritable;
                 import org.apache.hadoop.io.LongWritable;
                 import org.apache.hadoop.io.Text;
                 import org.apache.hadoop.mapreduce.Mapper;
                 public class NewWordMapper extends Mapper <LongWritable, Text, Text, IntWritable >
                 {
                    private final static IntWritable one = new IntWritable(1);
                    private Text word = new Text();
                    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
                    {
                        String line = value.toString();
                        StringTokenizertokenizer = new StringTokenizer(line);
                        while(tokenizer.hasMoreTokens())
                            {
                                word.set(tokenizer.nextToken());
                                context.write(word, one);
                            }
                        }

               }                

NewWordReducer.java

 
                
                 import java.io.IOException;
                 import org.apache.hadoop.io.IntWritable;
                 import org.apache.hadoop.io.Text;
                 import org.apache.hadoop.mapreduce.Reducer;
                 public class NewWordReducer extends Reducer <Text, IntWritable, Text, IntWritable >
                 {
                    public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException
                    {
                        int count = 0;
                        bleval : values)
               {
                count += val.get();
               }
               context.write(key, new IntWritable(count));
           }

               }                

    Step 9: Create Jar file Right click on WordCountProject->Export->java->jar file->Browse->give jar WordCount.jar filename ->OK->finish.

    Naming Convention
    how to export

    Naming Convention
    how to export

    Naming Convention
    how to export

    Naming Convention
    how to export

    Step 10: Create txt file name is inputfile

      hi how are you
      how is your job
      how is your family
      how is your brother
      how is your sister
      what is the time now
      what is the strength of hadoop

    Step 11 : create directory inside hdfs name is /home/user/input

    Naming Convention
    Create directory

    Step 12: move inputfile.txt in hdfs/home/user/input director.

    Naming Convention
    Input directory

    Step 13: Run WordCount.jar file in HDFS .

    Naming Convention
    File Location

    Step 14: Console final output.

    Naming Convention
    Final Output on Console

    Step 15: http://localhost:500070 Browser output.

    Naming Convention
    Browser output

    Step 16: out file part-r-00000

    Naming Convention
    out file