Data Science, amongst other things.

Tag: Ubuntu

Installing Eclipse 4.2.3 (Kepler) on Ubuntu 14.04

Hi all,

If you’ve followed any of my other posts you’ll know I recently wiped my OS (something I do alarmingly regularly) – as such, I’m reinstalling a bunch of stuff. Given that I do this so often, it makes sense for me to have a repository of tutorials for doing so!

Today I’m on Eclipse – I’m by no means an Eclipse regular. I do most of my coding in Python and find Vim works well enough for most of what I need to do. RStudio for R, Vim/IPython for Python, but when I’m doing anything in Java (inc. Android stuff) I’ll go with Eclipse. Now installing Eclipse on Ubuntu is really easy – there’s a version in the software centre that’ll work just fine. However, if you want a more up to date version then there’s a degree of hacking about required. Let’s go:

Check you’ve got Java installed

Give this command a cheeky little run (from terminal – press Ctrl + shift + t to get terminal on the screen):

which java
java -version

If they both worked you should have seen the path to where your Java is installed (mine was /usr/bin/java) and the version of Java you have installed (1.7 OpenJDK for me). If both commands worked (gave you a path and a version) then excellent, carry on wayward son. It doesn’t matter if your results are different to mine, just keep trooping on. If they didn’t work, you’ll want to install Java before continuing. I’ll not deal with that here but I mention how to do so in my post on installing Hadoop on Ubuntu.

Download Eclipse

As with installing Hadoop, there are a couple of ways to do this. I’m going to advise heading on over to the Eclipse download link:

Eclipse

Pick the top version (Eclipse standard) and decide which bit version you want (32 or 64). If you don’t know which you should be using, I’d advise running the following command:

 uname -a | awk '{print $12}' 

If the output is

x86_64

you’ll likely want 64 bit,

i386

tells me you’re after 32 bit. If it says something else entirely then I’m flummoxed – have a Google around to find out what bit your Ubuntu installation is.

If you’ve not got a GUI then I’m going to decide you shouldn’t be installing Eclipse. Wget the .tar.gz file if that’s you but really, what are you doing? Actually, maybe you’re setting up a bunch of computers over SSH which will have monitors in the future but don’t now. OK – wget this link if you’re on 64 bit:

http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/kepler/SR2/eclipse-standard-kepler-SR2-linux-gtk-x86_64.tar.gz&r=1

and this link if you’re not:

http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/kepler/SR2/eclipse-standard-kepler-SR2-linux-gtk.tar.gz&r=1

Installing Eclipse

At this point I’m assuming you’ve got a tarred Eclipse in your downloads folder and you know where Java is installed.

Open up a terminal, head over to your downloads directory and untar the Eclipse file:

cd ~/Downloads
tar -xzf eclipse-standard-kepler-SR2-linux-gtk-x86_64.tar.gz

Next we’re going to put it into a more sensible directory and make it easily launchable from terminal:

sudo mv eclipse /usr/local/eclipse
sudo ln -s /usr/local/eclipse/eclipse /usr/bin/eclipse

At this point, if you could kindly run the following command from terminal:

 eclipse 

I’d expect you to see Eclipse pop up and for you to be able to start developing.

Installing Hadoop 2.4 on Ubuntu 14.04

Hey all,

Another of my ‘getting my new operating system set up with all the bits of kit I use’ – this time we’ll be on Hadoop (and HDFS). There’s a very strong chance that this post will end up a lot like Sean’s post – Hadoop from spare-change. If there are any differences it’ll be for these reasons three:
1.) He was using Ubuntu Server 13.04 not Ubuntu Desktop 14.04
2.) He was using Hadoop 2.2 not Hadoop 2.4
3.) He was setting up a whole bunch of nodes – I’m stuck with this oft-abused laptop

Anywho – on with the show.

Step 1:

Download Hadoop from Apache: I’ll be using this mirror but I trust that if you’re not in England, you can likely find a more suitable one:
http://mirror.ox.ac.uk/sites/rsync.apache.org/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz

If you’re trying to stick to the terminal/don’t have a GUI then go with this:

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz

Find your way to wherever you downloaded the tar.gz file and untar it using the following command:

tar -xzf hadoop-2.4.0.tar.gz

Sorry if I’m teaching you to suck eggs – everybody has to start somewhere right?

Has it worked up till here?

Run the following command in the same directory you ran the above tar command:

ls | grep hadoop | grep -v *.gz

If there’s at least one line returned (ideally hadoop-2.4.0) then you’re good up till here.

Step 2:

Let’s move everything into a more appropriate directory:

sudo mv hadoop-2.4.0/ /usr/local
cd /usr/local
sudo ln -s hadoop-2.4.0/ hadoop

We create that link to allow us to write scripts/programs that interact with Hadoop that won’t need changing if we upgrade our Hadoop version. All we’ll do is install the new version and point the Hadoop folder to the new version instead. Ace.

Has it worked up to here?

Run this command anywhere:

whereis hadoop

If the output is:
hadoop: /usr/local/hadoop
you may proceed.

Step 3:

Righty, now we’ll be setting up a new user and permissions and all that guff. I’ll steal directly from Michael Noll’s tutorial here and go with:

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo
sudo chown -R hduser:hadoop /usr/local/hadoop/

Has it worked up to here?

Type:

ls -l /home/ | grep hadoop

If you see a line then you’re in the money.

Step 4:

SSH is a biggy – possibly not so much for the single node tutorial but when we were setting up our first cluster, SSH problems probably accounted for about 90% of all head-scratching with the remaining 10% being nits.


su - hduser
sudo apt-get install ssh
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

So we switch to our newly created user, generate an SSH key and get it added to our authorized keys. Unfortunately, Hadoop and ipv6 don’t play nice so we’ll have to disable it – to do this you’ll need to open up /etc/sysctl.conf and add the following lines to the end:


net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Fair warning – you’ll need sudo privileges to modify the file so might want to open up your file editor like this:

sudo apt-get install gksu
gksu gedit /etc/sysctl.conf

If you’re set on using terminal then this’ll do it:

echo "net.ipv6.conf.all.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv6.conf.default.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv6.conf.lo.disable_ipv6 = 1" | sudo tee -a /etc/sysctl.conf

Rumour has it that at this point you can run
sudo service networking restart
and kapeesh – ipv6 is gone. However, Atheros and Ubuntu seem to have a strange sort of ‘not working’ thing going on and so that command doesn’t work with my wireless driver. If the restart fails, just restart the computer and you should be good.

(if you’re terminal only : sudo shutdown -r now )

Has it worked up to here?

If you’re stout of heart, attempt the following:

su - hduser
ssh localhost

If that’s worked you be greeted with a message along the lines of ‘Are you sure you want to continue connecting?’ The answer you’re looking for at this point is ‘yes’.

If it hasn’t worked at this point run the following command:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6

If the value returned is 0 then you’ve still not got ipv6 disabled – have a re-read of that section and see if you’ve missed anything.

Step 5:
I’m going to assume a clean install of Ubuntu on your machine (because that’s what I’ve got) – if this isn’t the case, it’s entirely likely you’ll already have Java installed. If so, find your JAVA_HOME (lots of tutorials on this online) and use that for the upcoming instructions. I’m going to be installing Java from scratch:

sudo apt-get update
sudo apt-get install default-jdk

Given a bit of luck, you’ll now have Java on your computer (I do on mine) and you’ll be able to set your environment variables. Open up your bashrc file:

su - hduser
gksu gedit .bashrc

and add the following lines:

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr

and follow up with this command:
source ~/.bashrc

If you’ve deviated from any of the instructions above, those lines are likely to be different. You can find what your java home should be by running the following command:
which java | sed -e 's/(.*)/bin/java/1/g'

Your Hadoop home will be wherever you put it in step 2.

Has it worked up to here?

So many different ways to test – let’s run our first Hadoop command:

/usr/local/hadoop/bin/hadoop version

If that worked with no error (and gave you your Hadoop version) then you’re laughing.

Step 6:

Configuration of Hadoop (and associated bits and bobs) – we’re going to be editing a bunch of files so pick your favourite file editor and get to work. First things first though, you’re going to want some place for HDFS to save your files. If you’ve going to be storing anything big/bought external storage for this purpose now is the time to deviate from this tutorial. Otherwise, this should do it:


su - hduser
mkdir /usr/local/hadoop/data

Now for the file editing:

(only necessary when running a multi-node cluster, but let’s do it in case we ever get more nodes to add)
1.) /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Change export JAVA_HOME=${JAVA_HOME} to match the JAVA_HOME you set in your bashrc (for us JAVA_HOME=/usr).
Also, change this line:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true
to be

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.library.path=$HADOOP_PREFIX/lib"

And finally, add the following line:
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native

2.) /usr/local/hadoop/etc/hadoop/yarn-env.sh
Add the following lines:

export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-"/lib/native"}
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"

3.) /usr/local/hadoop/etc/hadoop/core-site.xml
Change the whole file so it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/data</value>
</property>
</configuration>

4.) /usr/local/hadoop/etc/hadoop/mapred-site.xml
Change the whole file so it looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

5.) /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Change the whole file so it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

6.) /usr/local/hadoop/etc/hadoop/yarn-site.xml
Change the whole file so it looks like this:

<?xml version="1.0"?>
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:8025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:8050</value>
    </property>
</configuration>

Annnd we’re done 🙂 Sorry about that – if I could guarantee that you’d be using the same file paths and OS as me then I’d let you wget those files from a Github somewhere but alas, I think that’s likely to cause more headaches than it solves. Don’t worry, we’re nearly there now 🙂

Has it worked up to here?

Run the following command:

/usr/local/hadoop/bin/hadoop namenode -format

If that works, you’re 20% of the way there.

Then, run:

/usr/local/hadoop/sbin/start-dfs.sh

If that seems to work without throwing up a bunch of errors:

/usr/local/hadoop/sbin/start-yarn.sh

If that’s worked, you can safely say you’ve got Hadoop running on your computer 🙂 Get it on the LinkedIn as a strength as soon as possible 😉

Conclusion
Now you’ve got Hadoop up and running on your computer, what can you do? Well, unfortunately with that single node and single hard disk, not much you couldn’t have done without it. However, if you’re just getting started with Linux and Hadoop you’ll have hopefully learnt a bit on the way to setting up your cluster.

Installing R on Ubuntu 14.04

A quick break from our analysis of UK Government house sale data – I’ve decided that I’d like to do the analysis in R (reasons for this will be explained in a later post). A new version of Ubuntu is out (Trusty Tahr) and, after updating, I realised I didn’t have R on my computer. So, a quick and simple guide to installing R:

Step 1:

Assuming you’ve got the internet and everything set up as it should be run:
sudo apt-get install r-base-core

Go ahead and accept the new 147MB (or thereabouts) and you should now be able to type R at the terminal and see R (verion 3.0.2 at the time of writing) fire up. If this hasn’t worked, drop me a comment and I’ll see what we can do about it.

Step 2:

Now – assuming you’re going to want a bit more than the standard packages (I’d strongly advise ggplo2 – it’s well good), you’ll want r-base-dev. I’d give the following command a go:

sudo apt-get install r-base-dev

Fortunately, it looks to me like this is bundled with r-base-core now so that command didn’t do anything so might not be necessary. But no harm done, eh?

Now, open up R with sudo privileges (if you don’t do this and have the standard permissions and install locations, R won’t have permissions to write to /usr/lib – you can use your personal library if you like, but I won’t):

sudo R

and (for here in on, we’re in the R terminal) run:

update.packages()

There are a bunch of libraries you may be interested in, but for me (and indeed, for the next bit of data analysis I’m going to do) ggplot2 will suffice.

install.packages('ggplot2', dependencies=TRUE)

If that worked you should now be able to type library(ggplot2) without errors.

Step 3:

While I’m sure you’re a big fan of just bashing out your R code in the terminal, sometimes it’s nice to have an IDE and RStudio is at the front of the pack when it comes to R IDEs. Getting this on Ubuntu is a doddle.

Head on over to the R Studio download site and download the version with Ubuntu in the name (RStudio 0.98.501 – Debian 6+/Ubuntu 10.04+ (64-bit) for my system at the time of writing). You can then open this file using Ubuntu Software Centre (it should open in this by default) – click install and you’re on your way!

Now, everything being OK you should be able to open up R Studio and develop away to your heart’s content 🙂 If there are any problems with this, I’d encourage you to leave comments and we’ll see if we can get to the bottom of this. As a point of interest, if you need any more packages installing, you’ll need to pop into terminal, open up R in sudo mode and install them from there. There are fixes for this (check out the official R documentation for this) but I don’t think it’s that much of a problem that it’s worth bothering with.

© 2024 DogDogFish

Theme by Anders NorenUp ↑