Installing Hadoop 2.4 on Ubuntu 14.04

April 26, 2014 / Matthew Sharpe / 59 Comments

Hey all,

Another of my ‘getting my new operating system set up with all the bits of kit I use’ – this time we’ll be on Hadoop (and HDFS). There’s a very strong chance that this post will end up a lot like Sean’s post – Hadoop from spare-change. If there are any differences it’ll be for these reasons three:
1.) He was using Ubuntu Server 13.04 not Ubuntu Desktop 14.04
2.) He was using Hadoop 2.2 not Hadoop 2.4
3.) He was setting up a whole bunch of nodes – I’m stuck with this oft-abused laptop

Anywho – on with the show.

Step 1:

Download Hadoop from Apache: I’ll be using this mirror but I trust that if you’re not in England, you can likely find a more suitable one:
http://mirror.ox.ac.uk/sites/rsync.apache.org/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz

If you’re trying to stick to the terminal/don’t have a GUI then go with this:

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz

Find your way to wherever you downloaded the tar.gz file and untar it using the following command:

tar -xzf hadoop-2.4.0.tar.gz

Sorry if I’m teaching you to suck eggs – everybody has to start somewhere right?

Has it worked up till here?

Run the following command in the same directory you ran the above tar command:

ls | grep hadoop | grep -v *.gz

If there’s at least one line returned (ideally hadoop-2.4.0) then you’re good up till here.

Step 2:

Let’s move everything into a more appropriate directory:

sudo mv hadoop-2.4.0/ /usr/local cd /usr/local sudo ln -s hadoop-2.4.0/ hadoop

We create that link to allow us to write scripts/programs that interact with Hadoop that won’t need changing if we upgrade our Hadoop version. All we’ll do is install the new version and point the Hadoop folder to the new version instead. Ace.

Has it worked up to here?

Run this command anywhere:

whereis hadoop

If the output is:
hadoop: /usr/local/hadoop
you may proceed.

Step 3:

Righty, now we’ll be setting up a new user and permissions and all that guff. I’ll steal directly from Michael Noll’s tutorial here and go with:

sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo sudo chown -R hduser:hadoop /usr/local/hadoop/

Has it worked up to here?

Type:

ls -l /home/ | grep hadoop

If you see a line then you’re in the money.

Step 4:

SSH is a biggy – possibly not so much for the single node tutorial but when we were setting up our first cluster, SSH problems probably accounted for about 90% of all head-scratching with the remaining 10% being nits.

su - hduser sudo apt-get install ssh ssh-keygen -t rsa -P "" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

So we switch to our newly created user, generate an SSH key and get it added to our authorized keys. Unfortunately, Hadoop and ipv6 don’t play nice so we’ll have to disable it – to do this you’ll need to open up /etc/sysctl.conf and add the following lines to the end:

net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1

Fair warning – you’ll need sudo privileges to modify the file so might want to open up your file editor like this:
sudo apt-get install gksu gksu gedit /etc/sysctl.conf

If you’re set on using terminal then this’ll do it:
echo "net.ipv6.conf.all.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf echo "net.ipv6.conf.default.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf echo "net.ipv6.conf.lo.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf

Rumour has it that at this point you can run
sudo service networking restart
and kapeesh – ipv6 is gone. However, Atheros and Ubuntu seem to have a strange sort of ‘not working’ thing going on and so that command doesn’t work with my wireless driver. If the restart fails, just restart the computer and you should be good.

(if you’re terminal only : sudo shutdown -r now )

Has it worked up to here?

If you’re stout of heart, attempt the following:

su - hduser ssh localhost

If that’s worked you be greeted with a message along the lines of ‘Are you sure you want to continue connecting?’ The answer you’re looking for at this point is ‘yes’.

If it hasn’t worked at this point run the following command:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6

If the value returned is 0 then you’ve still not got ipv6 disabled – have a re-read of that section and see if you’ve missed anything.

Step 5:
I’m going to assume a clean install of Ubuntu on your machine (because that’s what I’ve got) – if this isn’t the case, it’s entirely likely you’ll already have Java installed. If so, find your JAVA_HOME (lots of tutorials on this online) and use that for the upcoming instructions. I’m going to be installing Java from scratch:

sudo apt-get update sudo apt-get install default-jdk

Given a bit of luck, you’ll now have Java on your computer (I do on mine) and you’ll be able to set your environment variables. Open up your bashrc file:
su - hduser gksu gedit .bashrc
and add the following lines:
export HADOOP_HOME=/usr/local/hadoop export JAVA_HOME=/usr
and follow up with this command:
source ~/.bashrc

If you’ve deviated from any of the instructions above, those lines are likely to be different. You can find what your java home should be by running the following command:
which java | sed -e 's/(.*)/bin/java/1/g'

Your Hadoop home will be wherever you put it in step 2.

Has it worked up to here?

So many different ways to test – let’s run our first Hadoop command:

/usr/local/hadoop/bin/hadoop version

If that worked with no error (and gave you your Hadoop version) then you’re laughing.

Step 6:

Configuration of Hadoop (and associated bits and bobs) – we’re going to be editing a bunch of files so pick your favourite file editor and get to work. First things first though, you’re going to want some place for HDFS to save your files. If you’ve going to be storing anything big/bought external storage for this purpose now is the time to deviate from this tutorial. Otherwise, this should do it:

su - hduser mkdir /usr/local/hadoop/data

Now for the file editing:

(only necessary when running a multi-node cluster, but let’s do it in case we ever get more nodes to add)
1.) /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Change export JAVA_HOME=${JAVA_HOME} to match the JAVA_HOME you set in your bashrc (for us JAVA_HOME=/usr).
Also, change this line:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
to be
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_PREFIX/lib"
And finally, add the following line:
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native

2.) /usr/local/hadoop/etc/hadoop/yarn-env.sh
Add the following lines:
export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-"/lib/native"} export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

3.) /usr/local/hadoop/etc/hadoop/core-site.xml
Change the whole file so it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/data</value>
</property>
</configuration>

4.) /usr/local/hadoop/etc/hadoop/mapred-site.xml
Change the whole file so it looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

5.) /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Change the whole file so it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

6.) /usr/local/hadoop/etc/hadoop/yarn-site.xml
Change the whole file so it looks like this:

<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8050</value>
    </property>
</configuration>

Annnd we’re done 🙂 Sorry about that – if I could guarantee that you’d be using the same file paths and OS as me then I’d let you wget those files from a Github somewhere but alas, I think that’s likely to cause more headaches than it solves. Don’t worry, we’re nearly there now 🙂

Has it worked up to here?

Run the following command:

/usr/local/hadoop/bin/hadoop namenode -format

If that works, you’re 20% of the way there.

Then, run:

/usr/local/hadoop/sbin/start-dfs.sh

If that seems to work without throwing up a bunch of errors:

/usr/local/hadoop/sbin/start-yarn.sh

If that’s worked, you can safely say you’ve got Hadoop running on your computer 🙂 Get it on the LinkedIn as a strength as soon as possible 😉

Conclusion
Now you’ve got Hadoop up and running on your computer, what can you do? Well, unfortunately with that single node and single hard disk, not much you couldn’t have done without it. However, if you’re just getting started with Linux and Hadoop you’ll have hopefully learnt a bit on the way to setting up your cluster.

Big Data, Guide, Hadoop, Ubuntu

Guide Hadoop Installation Linux Ubuntu

58 Comments

Yogesh
May 8, 2014 at 5:38 pm

The exclamation marks in core-site.xml is fuxing things up a bit. I removed them and it ‘seems’ to work.

/usr/local/hadooop/bin/hadoop namenode -format <– one 'o' too long

Very copy-paste friendly. Thanks 😀

Reply
- Matthew Sharpe
  May 8, 2014 at 7:23 pm
  
  Thanks for the comments – glad it’s helped you and I’ll make the changes you suggested. You’re absolutely right; one too many o’s in Hadoop and the exclamation marks shouldn’t be there.
  
  Reply
Denny Abraham Cheriyan
May 17, 2014 at 4:47 am

I know this is really basic, but you might want to add “cd /usr/local” in step 2 (Probably for newbies).

sudo mv hadoop-2.4.0/ /usr/local
cd /usr/local
sudo ln -s hadoop-2.4.0/ hadoop

Thanks for the tutorial!

Reply
- Matthew Sharpe
  May 18, 2014 at 3:41 pm
  
  Have updated the post to take this into account – thanks for spotting!
  
  Reply
Denny Abraham Cheriyan
May 17, 2014 at 6:17 am

It would be really helpful, if you could add an example of how to run a MapReduce job (Probably Word Count). It would be great if you could demonstrate how to configure Eclipse for Hadoop as well.

Reply
Denny Abraham Cheriyan
May 20, 2014 at 12:36 am

Hi, I referred to the “Hadoop from spare-change” blog entry, and tried to execute the word count example. It gives me the below mentioned output, and does not create any map/reduce tasks. I’m not able to figure out whats wrong. I would really appreciate it if you could help me out.

Command –
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount /test /testout

Output –
14/05/19 12:07:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
14/05/19 12:07:51 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8040
14/05/19 12:07:53 INFO input.FileInputFormat: Total input paths to process : 1
14/05/19 12:07:53 INFO mapreduce.JobSubmitter: number of splits:1
14/05/19 12:07:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1400525302476_0001
14/05/19 12:07:53 INFO impl.YarnClientImpl: Submitted application application_1400525302476_0001
14/05/19 12:07:53 INFO mapreduce.Job: The url to track the job: http://dennyac-HP-ENVY-TS-15-Notebook-PC:8088/proxy/application_1400525302476_0001/
14/05/19 12:07:53 INFO mapreduce.Job: Running job: job_1400525302476_0001

Reply
- Denny Abraham Cheriyan
  May 20, 2014 at 12:37 am
  
  It just gets stuck at the line “14/05/19 12:07:53 INFO mapreduce.Job: Running job: job_1400525302476_0001”
  
  Reply
  - Matthew Sharpe
    May 20, 2014 at 12:45 pm
    
    Hi,
    
    Sorry about that – have just checked it over. Would you be able to change this value in your yarn-site.xml:
    
    yarn.resourcemanager.address
    localhost:8040
    
    to
    
    yarn.resourcemanager.address
    localhost:8050
    
    Then restart Yarn (/usr/local/hadoop/sbin/stop-yarn.sh followed by /usr/local/hadoop/sbin/start-yarn.sh) and let me know if the problem persists?
    
    Reply
  - Matthew Sharpe
    May 20, 2014 at 9:27 pm
    
    OK – I’ve managed to replicate the problem and have modified the tutorial above to take this into account. The yarn resource manager uses port 8040 by default and so we were unable to run a mapreduce using that port as well – changing this port value to something different (and not in use) should fix things. Let me know if it doesn’t 🙂
    
    Reply
Denny Abraham Cheriyan
May 21, 2014 at 7:07 am

Its working now. Thanks a lot!

Reply
Yogesh
June 20, 2014 at 8:01 am

Reblogged this on Qruize Labs and commented:
An excellent tutorial on installing Hadoop on Ubuntu 14.04 LTS – Trusty Tahr. You can install Oracle JDK instead of OpenJDK if you wish to do so.
Goto: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html , agree to the terms and download the tar.gz file corresponding to your architecture.

Extract the compressed archive and move it to /usr/lib/jvm
$ sudo mv jdk1.8.0_05 /usr/lib/jvm/
Let’s install the binaries in this directory as the defaults for java.
$ sudo update-alternatives –install “/usr/bin/java” “java” “/usr/lib/jvm/jdk1.8.0_05/bin/java” 1
$ sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/lib/jvm/jdk1.8.0_05/bin/javac” 1
$ sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/lib/jvm/jdk1.8.0_05/bin/javaws” 1
$ sudo update-alternatives –config java
$ sudo update-alternatives –config javac
Append the following line in .bashrc or .zshrc (You don’t use ZSH??)
export JAVA_HOME=”/usr/lib/jvm/jdk1.8.0_05″
$ source .bashrc
So there. Now you can skip the part that deals with default-jdk in this article.

Good luck.

Reply
Patrick McConlogue (@aggimagine)
June 25, 2014 at 9:40 pm

Absolutely, excellent. And I never comment on tutorials.
Installed on AWS micro instance, Ubuntu 14.04

Reply
- Matthew Sharpe
  August 17, 2014 at 9:47 am
  
  Thanks Patrick – glad you found it useful!
  
  Reply
Nyalakonda Ramakrishna Reddy
July 19, 2014 at 2:42 am

Thanks a lot finaly installed hadoop.

Reply
- Matthew Sharpe
  August 17, 2014 at 9:48 am
  
  No problem Nyalakonda, if there are any other topics you’d like to see me cover then give me a shout.
  
  Reply
Joe Lebus Ganem
August 14, 2014 at 6:29 am

Im Scared:
hduser@DEVCOINAP1:~$ /usr/local/hadoop/bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Unrecognized option: –
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Reply
- Joe Lebus Ganem
  August 14, 2014 at 6:35 am
  
  Ok, the command now is:
  /usr/local/hadoop/bin/hadoop fs -namenode -format
  (NOTE THE FS BEFORE -NAMENODE)
  
  But im still getting the issue:
  Unrecognized option: –
  Error: Could not create the Java Virtual Machine.
  Error: A fatal exception has occurred. Program will exit.
  
  Reply
  - Matthew Sharpe
    August 17, 2014 at 9:52 am
    
    Hi Joe,
    
    You have good reason to be scared. That sounds terrifying. Fortunately, I think I might have the solution 🙂 Are you running the format command as hduser or as root? If you’re running it as root then that’s the sort of error message I’d expect to see. Before running the command if you switch to hduser:
    
    su hduser
    
    I think you’ll have more success. If that doesn’t work, give me a shout.
    
    Reply
    - hemant aggarwal
      May 29, 2018 at 6:46 am
      
      Hi Matthew,
      
      I am also getting same error even though I am running the command as hduser. How to resolve this error ?
      
      Reply
aayush
August 16, 2014 at 7:34 am

I am getting this warning whenver i run start-dfs.sh(probably because i’m using ubuntu 13.04)
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Can u tell me a solution

Reply
- Matthew Sharpe
  August 17, 2014 at 9:49 am
  
  Do the follow up steps work? That’s just a warning and so isn’t something to be too concerned about.
  
  Reply
Joshua Lickteig
August 27, 2014 at 11:14 pm

This is simply excellent, many thanks. Also, nice style-

Reply
Niladri Kumar Saha Roy
August 30, 2014 at 5:34 pm

Awesome tutorial for beginners for me. Shall keep an eye for more from you. Keep up the good work

Reply
Yanish
September 1, 2014 at 9:41 am

Install Hadoop 2.5 in ubuntu 14.04 as well as RHadoop.
http://hsinay.blogspot.in/p/install-hadoop-25-in-ubuntu-as-well-as.html

Reply
Venkatesh Kadiri
September 3, 2014 at 6:41 pm

I have followed the tutorial .At the end I wanted to check if hadoop is running by typing command jps… It lists out only jps entry but no entries for name node ,data node and secondary nodes…

Reply
- Sarthak
  October 20, 2014 at 9:21 am
  
  Yeah, even I got a similar problem, but I got ResourceManager and NodeManager too in addition to Jps when I run the above command.
  
  Please help!
  
  Thank you
  
  Reply
Luigi Valsecchi
September 7, 2014 at 5:56 pm

Great job, Matthew: after reading at least other 10 tutorials without any luck, following your allowed me to have now my Hadoop up and running…

Only a couple of comments:
– in step 6, for hadoop-env.sh I had to add “/native” to the value of java.library.path for the HADOOP_OPTS variable to avoid the warning message about missing native-hadoop library when testing the following commands (I had previously compiled Hadoop for my 64bit machine and so the correct library version was in place):
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_PREFIX/lib/native”

– when I tried to format the namenode, instead of:
/usr/local/hadoop/bin/hadoop namenode -format
I had to use the following to avoid warning messages of depracated commands:
/usr/local/hadoop/bin/hdfs namenode -format

Anyway, apart from these details, I have to say well done!
Thank you.

Reply
Pradeep
September 16, 2014 at 1:53 am

Hi Great blog. I was able to install successfully following your instructions. Can you please show how to setup a master and slave node as well? There are other websites out there, but none are as detailed as your blog is.

Reply
Steve
September 26, 2014 at 10:31 pm

Thanks for a great tutorial! I had to make one small change in Step 3. I needed the -H option with the chown.

Reply
Narayan Maharana
October 5, 2014 at 2:30 pm

I tried with all steps and able to install hadoop and work with this.Thanks a lot for this awesome tips. Great work.

Reply
Ashim Gupta
October 8, 2014 at 8:09 am

Hey, Thanks for a great tutorial.
When I ran a jps command after starting all components, it lists everything except JobTracker and TaskTracker. Can you suggest any fix?

Reply
Ashim Gupta
October 8, 2014 at 12:05 pm

Sir Your tutorial is great. Thanks for the tutorial.
I am a newbie and I have a query : when after starting all components, I ran jps, it didn’t show the JobTracker and TaskTracker status.
Can you please suggest me a fix?

Reply
Milford
November 1, 2014 at 8:57 am

Thanks for sharing your thoughts about des.
Regards

Reply
Ashok Kasthuri
November 5, 2014 at 5:37 pm

You made it very simple. Thanks a lot. Provide steps or links after installation to perform an example then the tutorial looks complete. Thanks again.

Reply
theevilhour
November 16, 2014 at 9:09 pm

Thanks, it works! I think.

Reply
charan
November 19, 2014 at 2:48 pm

I couldn’t find datanode in my jps
please help me

Reply
satmanne
December 27, 2014 at 11:59 am

Thank you! It helped.

Reply
Hoser
January 9, 2015 at 6:58 pm

OK, now the fun begins. Seems to be the best tutorial available. Works for Java 1.7, Hadoop 2.6.0, and Ubuntu 14.04. Thanks!!

Reply
Nabid
February 11, 2015 at 7:47 pm

I got this error while executing “/usr/local/hadoop/sbin/start-dfs.sh”
15/02/12 01:39:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [localhost]
hduser@localhost’s password:
localhost: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hduser-namenode-prome.out
hduser@localhost’s password:
localhost: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-hduser-datanode-prome.out
Starting secondary namenodes [0.0.0.0]
hduser@0.0.0.0’s password:
0.0.0.0: secondarynamenode running as process 6040. Stop it first.
15/02/12 01:40:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Reply
- Matthew Sharpe
  May 3, 2015 at 7:24 pm
  
  Probably far too late to be of any use but if I had to guess I’d say your problem there is due to passwordless SSH problems or DNS resolution. You see how it’s trying to prompt you for a password? And connect to 0.0.0.0 instead of localhost? Did you ever fix this?
  
  Reply
prashant madaan
February 18, 2015 at 7:04 pm

Hi my namenode fromat has worked but when i run “/usr/local/hadoop/sbin/start-dfs.sh” it tells me JAVA_HOME is not set and could not be found.
please help . i am a new bee in hadoop world

Reply
- prashant madaan
  February 18, 2015 at 7:08 pm
  
  hi sorry that thing worked out , did a silly mistake !
  when i run “/usr/local/hadoop/sbin/start-dfs.sh”
  i get this
  WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
  
  can you explain what it means ty 🙂
  
  Reply
  - Matthew Sharpe
    May 3, 2015 at 7:28 pm
    
    Hey, the main thing is that you don’t need to worry about it. Everything should still run just fine.
    
    It’s talking about classes that Hadoop uses that are also available in your Java distribution. I don’t know which classes it’s specifically talking about but basically the same classes are probably available in both the version of Java you downloaded and in some Hadoop distributions. Your distribution doesn’t seem to have the classes it’s looking for (nor has any version I’ve ever used) and so it’s falling back to the Java classes.
    
    Or at least that’s the case provided that warning message is accurate. Otherwise, all hope is lost.
    
    Reply
- ecacarva
  March 25, 2015 at 11:12 pm
  
  Same happens with me, configure JAVA_HOME=/usr (only) on your .bashrc and on hadoop-env.sh
  
  Reply
Michael Vedomske
February 25, 2015 at 11:37 pm

Just followed this with Hadoop 2.6.0 on elementary OS and worked without a hitch. Great clear tutorial. Thank you!

Reply
deepika
March 14, 2015 at 7:37 am

i am getting error like: unable to resolve host address ‘mirror.ox.ac.uk

Reply
- Matthew Sharpe
  May 3, 2015 at 7:28 pm
  
  Don’t know if this is still a problem but my first bet would be that that particular mirror was down when you tried – is that still the case?
  
  If so I’ll update the tutorial to point at a new mirror 🙂
  
  Reply
deepika
March 14, 2015 at 10:15 am

while installing ssh file i am getting ‘openssh-server’ has no installation candidate

Reply
- Matthew Sharpe
  May 3, 2015 at 7:30 pm
  
  Have you fixed this yet? It’s just a case of getting an SSH client installed. If `sudo apt-get install openssh-server` or similar doesn’t work then can you tell me what OS you’re using?
  
  Reply
ecacarva
March 25, 2015 at 11:13 pm

I am looking for tutorials and books for 2 weeks and nothing works, this excellent tutoria works for me, Congrats

Reply
ecacarva
March 26, 2015 at 12:36 am

Dear Matthew

When running “hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5” it starts running normal, so i got an error: Container launch failed for container _1427328614988_0001_01_000002:org.apache.hadoop.yarn.exceptions.InvalidAuxServicesException:The auxService:mapreduce_shuffle does not exist

Can you help with this error?
Tks

Reply
- Matthew Sharpe
  May 3, 2015 at 7:33 pm
  
  Hey,
  
  I’m not certain because it looks an awful lot to me like you’re running Hadoop 2.6 and this is a tutorial all about Hadoop 2.4 (if you’re not running 2.6 then that’ll likely be your problem!) but I’d likely point you in this direction otherwise: http://stackoverflow.com/questions/26381540/the-auxservicemapreduce-shuffle-does-not-exist
  
  Reply
Giuseppe
May 2, 2015 at 5:51 pm

This is for me the 4° day that i’m trying to install Hadoop on 3 different SO. Finally i decide to remove K-untu and install U-untu… So after a lot of hours of copy&paste… At the finally command…
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
And it’s look like that i did all the istruction…

Reply
- Matthew Sharpe
  May 3, 2015 at 7:32 pm
  
  I’m sure I don’t need to tell you this but it looks like your JAVA_HOME isn’t set! See step 5 and then please paste the output of the command I tell you to run in there if it still isn’t working.
  
  (In summary, make sure you’ve got your JAVA_HOME set in your .bashrc and have run `source ~/.bashrc`
  
  Reply
Pedro Cunha
June 22, 2015 at 5:28 pm

Hi, thanks for a great tutorial. I follow your work and would like to ask two questions:

1) – you have some tutorial about Hbase and its configuration with this work under the Hadoop in a fully distributed system?

2) – I want to do some experiments in Cassandra and HBase. To do that I need an adequate dataset. The dataset I’m looking for has to be large enough (more than 1GB) and the data in it has to be sufficiently unstructured to be representative of the kind of problems that relational technology can’t cope. Maybe data derived from social networks, and so on. Has anyone that kind of dataset? Or anyone knows where can I find such a dataset?

Thanks for your help.

Reply
zamani
October 20, 2015 at 9:29 am

Hi thank you for the tutorial….

i have no idea about linux just follow your tutorial… so far ok but at step 4 to edit /etc/sysctl.conf.

after install gksu and try to run a got this problem…

(gksu:3399): Gtk-warning **: cannot open display:

somebody pls help me.

Thanks.

Reply
- Nilesh Dengle
  December 19, 2015 at 3:06 pm
  
  Try sudo gedit /etc/sysctl.conf.
  
  Reply
Parvathi
November 29, 2015 at 4:52 pm

Thanks Matthew for the wonderful tutorial …
It helped me alot.
I did face some trouble in updating config files as I copied tar with default user and had to update them with hduser.
I managed it by changing owner & group permissions.

Once again thank you.

Regards
Parvathi

Reply