Friday 20 April 2012

Installing Apache Mahout on Ubuntu

Hi ..nice to see u guys in here.. the thought of putting in a tutorial came on to me when i had quite a tough time while installing Mahout.. its not difficult .. but u do get stuck at small itty bitty mistake u make while in the installing process or not knowing  the exact dependencies required.. which leads you to errors and then u end up in the game similar to a treasure hunt .. so lets start ..

Step 1: INSTALLATION OF  Sun JAVA JDK

this would look simple and quite a petty thing to do .. but if not done in the correct way leads to some major issues.. 

1. Install from Synaptic Package manager :

Most of us would be doing that .. which is fine..

Systems-Administration-Synaptic Package Manager-- install sun java jdk
 
but check the sun java update.. if the jdk is of older version it has got few bugs in it and installation of mahout gets into trouble ..thats what happened to me ..  so i installed   jdk1.6.0_26 manually the latest version from the following website

2.Installation of java manually

Java update 26 Download

1.Accept the license
2.Download the self extracting installer  file depending upoon 64 bit, 32 bit. or x86.
3.After the Download you would see a .bin file

 the way i have installed is

ganesh@ganesh-VPCEG2AEN:~$ mkdir programs
ganesh@ganesh-VPCEG2AEN:~$ cd programs
ganesh@ganesh-VPCEG2AEN:~/programs$ bash ../Downloads/jdk-6u26-linux-i586.bin


the above command would unpack and in the end it will ask you to press enter to continue..at that instance you need to press enter.


ganesh@ganesh-VPCEG2AEN:~/programs$ ln -s jdk1.6.0_26 jdk
ganesh@ganesh-VPCEG2AEN:~/programs$ ls -l
total 4
lrwxrwxrwx  1 ganesh ganesh   11 2012-04-12 13:55 jdk -> jdk1.6.0_26
drwxr-xr-x 10 ganesh ganesh 4096 2012-04-12 13:52 jdk1.6.0_26

ganesh@ganesh-VPCEG2AEN:~/programs$ cd ..




ganesh@ganesh-VPCEG2AEN:~/programs$ sudo gedit .bashrc


when the .bashrc file would open.. set the JAVA HOME by adding  the following lines

#Set Java Home
export JAVA_HOME=$HOME/programs/jdk
export PATH=.:$JAVA_HOME/bin:$PATH

once this is done u have finished the procedure of Downloading...Installing and Configuring.. to check if its installed or not.. follow the below command..

ganesh@ganesh-VPCEG2AEN:~$ java -version




you should get the following output

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Server VM (build 20.1-b02, mixed mode)





once java is done we can now go further..........


STEP 2 : INSTALLATION OF MAVEN .

One can install maven from synaptic package manger or by running a command  

sudo apt-get install maven2

but here too i have installed Apache maven 3.0.4 manually by following the below mentioned procedure ...

  1. Extract the distribution archive, i.e. apache-maven-3.0.4-bin.tar.gz to the directory you wish to install Maven 3.0.4. These instructions assume you chose /usr/local/apache-maven. The subdirectory apache-maven-3.0.4 will be created from the archive.
  2. open the .bashrc file again ... and add the following lines 
############## Apache-Maven #########
export M2_HOME=/usr/local/apache-maven-3.0.4
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
export JAVA_HOME=$HOME/programs/jdk

 3. Make sure that JAVA_HOME is set to the location of your JDK, e.g. export JAVA_HOME=/usr/java/jdk1.5.0_02 and that $JAVA_HOME/bin is in your PATH environment variablee. (which we have already done while installing JAVA)

4.Run mvn --version to verify that it is correctly installed.

one should get a display as shown below

Apache Maven 3.0.4 (r1232337; 2012-01-17 14:14:56+0530)
Maven home: /usr/local/apache-maven-3.0.4
Java version: 1.6.0_26, vendor: Sun Microsystems Inc.
Java home: /home/ganesh/programs/jdk1.6.0_26/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux", version: "2.6.35-22-generic", arch: "i386", family: "unix"



 ****note 
if you are getting connected through a proxy connection ,we need to change the proxy settings in settings file of maven .. until this is done  maven does not connect .. make sure this is done..

we need to change the settings in the settings.xml page in the conf direcotry of the apache maven, in my case 

ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/apache-maven-3.0.4
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4$ cd conf
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ ls
settings.xml
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ sudo gedit settings.xml
 


once the settings.xml page opens 

<!-- proxy
     | Specification for one proxy, to be used in connecting to the network.
    
    <proxy>
      <id>optional</id>
      <active>true</active>
      <protocol>http</protocol>
      <username>put in your username</username>
      <password>put in your password</password>
      <host>put in the host address</host>
      <port>port </port>
      <nonProxyHosts>localhost</nonProxyHosts>
    </proxy>
   
  </proxies>
|-->




make sure to uncomment the code lines.
 


3.INSTALLATION OF HADOOP

the following website provides a beautiful tutorial of installing single node  hadoop cluster  

Single node hadoop cluster set up

*** make sure that u have installed ssh  or else you would get an error as

ganesh@ganesh-VPCEG2AEN:~$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

so to avoid the above error

ganesh@ganesh-VPCEG2AEN:~$ sudo apt-get install openssh-server





 
 sTEP 4 ...finally to  INSTALLATION OF MAHOUT :

1. go to the following website  to download mahout source code 
   
     http://www.apache.org/dyn/closer.cgi/lucene/mahout/ 




select one of the mahout version which are 0.4 , 0.5 ,0.6 .. i had selected 0.6
and make sure that out of so many zipped files in there download the .src zipped file .. this is again the place were many of us make a mistake... 


2. check  if the folder has a pom.xml or not .. it would be there for sure but make sure u check once.


3. in my case i had Downloaded the file,unzipped it named it as mahout  and moved to /usr/local .

following are the commands

ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/mahout
ganesh@ganesh-VPCEG2AEN:/usr/local/mahout$ mvn install

you should see something as below :


 then it would perform the test ..  its recommended that the complete tests should be done and the process should be completed for the first time installation.. later when we run the mvn install we use a command to skip the tests which is  
mvn install -Dmaven.test.skip=true.
 
 
 

once the tests are done and the mahout is built .. we get a success msg as shown in the figure beow..


Congratulations mahout is installed...





16 comments:

  1. very nice tutorial ... thanks

    ReplyDelete
  2. got following error:
    Need help to resolve:

    [ERROR] BUILD ERROR
    [INFO] ------------------------------------------------------------------------
    [INFO] Internal error in the plugin manager executing goal 'org.apache.mahout:mahout-collection-codegen-plugin:1.0:generate': Unable to find the mojo 'generate' (or one of its required components) in the plugin 'org.apache.mahout:mahout-collection-codegen-plugin'
    Bad version number in .class file

    ReplyDelete
  3. Ive successfully installed with the help of this tutorial. But i don't know how to work with this. I couldn't find any good tutorials also. If anyone knows some good site means kindly post the link. Thanks in advance.

    ReplyDelete
  4. you may continue to work on the synetic_control data, 20News examples. Check oout books (e.g. machout in action) for referenece.

    ReplyDelete
  5. This is a nice written tutoril

    ReplyDelete
  6. i have installed mahout but cant run any dataset.please help me. do i need to set the mahout_home?????????/

    ReplyDelete
  7. how to fix this prblm !!

    The goal you specified requires a project to execute but there is no POM in this directory (/usr/local/mahout).

    ReplyDelete
    Replies
    1. j'ai fixé le probleme, juste download fichier pom from le site originale de mahout :
      http://www-us.apache.org/dist/mahout/<>/

      Delete
  8. please tel me how to fix the following error:
    Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

    I got this error while running LDA clustering. I am not able to get the output. Kindly help me

    ReplyDelete
  9. how can we check that is it working or not

    ReplyDelete
  10. I ve installed the apache, but i stil dont know how to start it, i found that we use mahout spark-shell to get home, please help me to pass into mahout's action.

    ReplyDelete