A typical error in writing reducer code

Suppose the input is:

00403D91436B76D22DDD88ACEAA41FB4
00403D91436B76D22DDD88ACEAA41FB4        s1
00403DA1239B66B92BD91FFF0EC2DC3F        s1|s3
00403DC1F314463D904A0C03C9714743

The reducer is to output the first column which is not unique.

#!/usr/bin/perl -w

use strict;

# loop vars
my $key     = "";
my $cur_key = "";

# key and vals
my $uid;
my $tag;
my $count = 0;

sub onBeginKey( ) {
$cur_key = $key;
$count   = 0;
}

sub onSameKey( ) {
$count++;
}

sub onEndKey( ) {
if ( $count == 2 ) {
printf STDOUT "%s\t%s\n", $cur_key, $tag;
}
}

while ( my $line = <STDIN> ) {
chomp($line);

my @fields      = split( /\t/, $line );
my $fields_size = scalar @fields;

$key = $fields[0];
if ( $fields_size == 2 && $fields[1]) {
$tag = $fields[1];
}


if ($cur_key) {
if ( $key ne $cur_key ) {
&onEndKey();
&onBeginKey();
}
&onSameKey();
}
else {
&onBeginKey();
&onSameKey();
}
}
if ($cur_key) {
&onEndKey();
}

The correct one is:

#!/usr/bin/perl -w

use strict;

# loop vars
my $key     = "";
my $cur_key = "";
my $o_tag = "";

# key and vals
my $uid;
my $tag;
my $count = 0;

sub onBeginKey( ) {
$cur_key = $key;
$count   = 0;
}

sub onSameKey( ) {
$count++;
if($tag){ $o_tag = $tag;}
}

sub onEndKey( ) {
if ( $count == 2 && $o_tag) {
printf STDOUT "%s\t%s\n", $cur_key, $o_tag;
}
}

while ( my $line = <STDIN> ) {
chomp($line);

my @fields      = split( /\t/, $line );
my $fields_size = scalar @fields;

$key = $fields[0];
if ( $fields_size == 2 && $fields[1]) {
$tag = $fields[1];
}
else{
$tag = "";
}

if ($cur_key) {
if ( $key ne $cur_key ) {
&onEndKey();
&onBeginKey();
}
&onSameKey();
}
else {
&onBeginKey();
&onSameKey();
}
}
if ($cur_key) {
&onEndKey();
}

Note: The code is Perl. But the logic is same for Java.

Perl execute shell command line – Quick Tutorial

There are several ways to do this, please check the following code to see what is the different:

#!/usr/bin/perl

use strict;

print "*** backtick ***\n";
my $result = `ls`;
print "$result\n";

print "*** system ***\n";
my $result2 = system ("ls");
print "system result: $result2\n";

print "\n*** readpipe ***\n";
my $result3 = readpipe ("ls");
print "$result3\n";

print "*** exec ***\n";
exec ("ls");

In a nutshell:

  • exec: does not return anything, it simply executes the command
  • system: creates a fork process and waits to see if the command succeeds or fails – returning a value
  • backtick and readpipe: used to capture the output of a system call