Hi ..nice to see u guys in here.. the thought of putting in a tutorial came on to me when i had quite a tough time while installing Mahout.. its not difficult .. but u do get stuck at small itty bitty mistake u make while in the installing process or not knowing the exact dependencies required.. which leads you to errors and then u end up in the game similar to a treasure hunt .. so lets start ..
Step 1: INSTALLATION OF Sun JAVA JDK
this would look simple and quite a petty thing to do .. but if not done in the correct way leads to some major issues..
1. Install from Synaptic Package manager :
Most of us would be doing that .. which is fine..
Systems-Administration-Synaptic Package Manager-- install sun java jdk
but check the sun java update.. if the jdk is of older version it has got few bugs in it and installation of mahout gets into trouble ..thats what happened to me .. so i installed jdk1.6.0_26 manually the latest version from the following website
2.Installation of java manually
Java update 26 Download
1.Accept the license
2.Download the self extracting installer file depending upoon 64 bit, 32 bit. or x86.
3.After the Download you would see a .bin file
the way i have installed is
ganesh@ganesh-VPCEG2AEN:~$ mkdir programs
ganesh@ganesh-VPCEG2AEN:~$ cd programs
ganesh@ganesh-VPCEG2AEN:~/programs$ bash ../Downloads/jdk-6u26-linux-i586.bin
the above command would unpack and in the end it will ask you to press enter to continue..at that instance you need to press enter.
ganesh@ganesh-VPCEG2AEN:~/programs$ ln -s jdk1.6.0_26 jdk
ganesh@ganesh-VPCEG2AEN:~/programs$ ls -l
total 4
lrwxrwxrwx 1 ganesh ganesh 11 2012-04-12 13:55 jdk -> jdk1.6.0_26
drwxr-xr-x 10 ganesh ganesh 4096 2012-04-12 13:52 jdk1.6.0_26
ganesh@ganesh-VPCEG2AEN:~/programs$ cd ..
ganesh@ganesh-VPCEG2AEN:~/programs$ sudo gedit .bashrc
when the .bashrc file would open.. set the JAVA HOME by adding the following lines
#Set Java Home
export JAVA_HOME=$HOME/programs/jdk
export PATH=.:$JAVA_HOME/bin:$PATH
once this is done u have finished the procedure of Downloading...Installing and Configuring.. to check if its installed or not.. follow the below command..
ganesh@ganesh-VPCEG2AEN:~$ java -version
you should get the following output
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Server VM (build 20.1-b02, mixed mode)
once java is done we can now go further..........
STEP 2 : INSTALLATION OF MAVEN .
One can install maven from synaptic package manger or by running a command
sudo apt-get install maven2
but here too i have installed Apache maven 3.0.4 manually by following the below mentioned procedure ...
export M2_HOME=/usr/local/apache-maven-3.0.4
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
export JAVA_HOME=$HOME/programs/jdk
3. Make sure that JAVA_HOME is set to the location of your JDK, e.g. export JAVA_HOME=/usr/java/jdk1.5.0_02 and that $JAVA_HOME/bin is in your PATH environment variablee. (which we have already done while installing JAVA)
4.Run mvn --version to verify that it is correctly installed.
one should get a display as shown below
Apache Maven 3.0.4 (r1232337; 2012-01-17 14:14:56+0530)
Maven home: /usr/local/apache-maven-3.0.4
Java version: 1.6.0_26, vendor: Sun Microsystems Inc.
Java home: /home/ganesh/programs/jdk1.6.0_26/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux", version: "2.6.35-22-generic", arch: "i386", family: "unix"
****note
if you are getting connected through a proxy connection ,we need to change the proxy settings in settings file of maven .. until this is done maven does not connect .. make sure this is done..
we need to change the settings in the settings.xml page in the conf direcotry of the apache maven, in my case
ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/apache-maven-3.0.4
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4$ cd conf
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ ls
settings.xml
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ sudo gedit settings.xml
once the settings.xml page opens
<!-- proxy
| Specification for one proxy, to be used in connecting to the network.
<proxy>
<id>optional</id>
<active>true</active>
<protocol>http</protocol>
<username>put in your username</username>
<password>put in your password</password>
<host>put in the host address</host>
<port>port </port>
<nonProxyHosts>localhost</nonProxyHosts>
</proxy>
</proxies>
|-->
make sure to uncomment the code lines.
3.INSTALLATION OF HADOOP
the following website provides a beautiful tutorial of installing single node hadoop cluster
Single node hadoop cluster set up
*** make sure that u have installed ssh or else you would get an error as
ganesh@ganesh-VPCEG2AEN:~$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
so to avoid the above error
ganesh@ganesh-VPCEG2AEN:~$ sudo apt-get install openssh-server
sTEP 4 ...finally to INSTALLATION OF MAHOUT :
1. go to the following website to download mahout source code
http://www.apache.org/dyn/closer.cgi/lucene/mahout/
select one of the mahout version which are 0.4 , 0.5 ,0.6 .. i had selected 0.6
and make sure that out of so many zipped files in there download the .src zipped file .. this is again the place were many of us make a mistake...
2. check if the folder has a pom.xml or not .. it would be there for sure but make sure u check once.
3. in my case i had Downloaded the file,unzipped it named it as mahout and moved to /usr/local .
following are the commands
ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/mahout
ganesh@ganesh-VPCEG2AEN:/usr/local/mahout$ mvn install
you should see something as below :
then it would perform the test .. its recommended that the complete tests should be done and the process should be completed for the first time installation.. later when we run the mvn install we use a command to skip the tests which is
once the tests are done and the mahout is built .. we get a success msg as shown in the figure beow..
Congratulations mahout is installed...
Step 1: INSTALLATION OF Sun JAVA JDK
this would look simple and quite a petty thing to do .. but if not done in the correct way leads to some major issues..
1. Install from Synaptic Package manager :
Most of us would be doing that .. which is fine..
Systems-Administration-Synaptic Package Manager-- install sun java jdk
but check the sun java update.. if the jdk is of older version it has got few bugs in it and installation of mahout gets into trouble ..thats what happened to me .. so i installed jdk1.6.0_26 manually the latest version from the following website
2.Installation of java manually
Java update 26 Download
1.Accept the license
2.Download the self extracting installer file depending upoon 64 bit, 32 bit. or x86.
3.After the Download you would see a .bin file
the way i have installed is
ganesh@ganesh-VPCEG2AEN:~$ mkdir programs
ganesh@ganesh-VPCEG2AEN:~$ cd programs
ganesh@ganesh-VPCEG2AEN:~/programs$ bash ../Downloads/jdk-6u26-linux-i586.bin
the above command would unpack and in the end it will ask you to press enter to continue..at that instance you need to press enter.
ganesh@ganesh-VPCEG2AEN:~/programs$ ln -s jdk1.6.0_26 jdk
ganesh@ganesh-VPCEG2AEN:~/programs$ ls -l
total 4
lrwxrwxrwx 1 ganesh ganesh 11 2012-04-12 13:55 jdk -> jdk1.6.0_26
drwxr-xr-x 10 ganesh ganesh 4096 2012-04-12 13:52 jdk1.6.0_26
ganesh@ganesh-VPCEG2AEN:~/programs$ cd ..
ganesh@ganesh-VPCEG2AEN:~/programs$ sudo gedit .bashrc
when the .bashrc file would open.. set the JAVA HOME by adding the following lines
#Set Java Home
export JAVA_HOME=$HOME/programs/jdk
export PATH=.:$JAVA_HOME/bin:$PATH
once this is done u have finished the procedure of Downloading...Installing and Configuring.. to check if its installed or not.. follow the below command..
ganesh@ganesh-VPCEG2AEN:~$ java -version
you should get the following output
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Server VM (build 20.1-b02, mixed mode)
once java is done we can now go further..........
STEP 2 : INSTALLATION OF MAVEN .
One can install maven from synaptic package manger or by running a command
sudo apt-get install maven2
but here too i have installed Apache maven 3.0.4 manually by following the below mentioned procedure ...
- Extract the distribution archive, i.e. apache-maven-3.0.4-bin.tar.gz to the directory you wish to install Maven 3.0.4. These instructions assume you chose /usr/local/apache-maven. The subdirectory apache-maven-3.0.4 will be created from the archive.
- open the .bashrc file again ... and add the following lines
export M2_HOME=/usr/local/apache-maven-3.0.4
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
export JAVA_HOME=$HOME/programs/jdk
3. Make sure that JAVA_HOME is set to the location of your JDK, e.g. export JAVA_HOME=/usr/java/jdk1.5.0_02 and that $JAVA_HOME/bin is in your PATH environment variablee. (which we have already done while installing JAVA)
4.Run mvn --version to verify that it is correctly installed.
one should get a display as shown below
Apache Maven 3.0.4 (r1232337; 2012-01-17 14:14:56+0530)
Maven home: /usr/local/apache-maven-3.0.4
Java version: 1.6.0_26, vendor: Sun Microsystems Inc.
Java home: /home/ganesh/programs/jdk1.6.0_26/jre
Default locale: en_IN, platform encoding: UTF-8
OS name: "linux", version: "2.6.35-22-generic", arch: "i386", family: "unix"
****note
if you are getting connected through a proxy connection ,we need to change the proxy settings in settings file of maven .. until this is done maven does not connect .. make sure this is done..
we need to change the settings in the settings.xml page in the conf direcotry of the apache maven, in my case
ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/apache-maven-3.0.4
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4$ cd conf
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ ls
settings.xml
ganesh@ganesh-VPCEG2AEN:/usr/local/apache-maven-3.0.4/conf$ sudo gedit settings.xml
once the settings.xml page opens
<!-- proxy
| Specification for one proxy, to be used in connecting to the network.
<proxy>
<id>optional</id>
<active>true</active>
<protocol>http</protocol>
<username>put in your username</username>
<password>put in your password</password>
<host>put in the host address</host>
<port>port </port>
<nonProxyHosts>localhost</nonProxyHosts>
</proxy>
</proxies>
|-->
make sure to uncomment the code lines.
3.INSTALLATION OF HADOOP
the following website provides a beautiful tutorial of installing single node hadoop cluster
Single node hadoop cluster set up
*** make sure that u have installed ssh or else you would get an error as
ganesh@ganesh-VPCEG2AEN:~$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
so to avoid the above error
ganesh@ganesh-VPCEG2AEN:~$ sudo apt-get install openssh-server
sTEP 4 ...finally to INSTALLATION OF MAHOUT :
1. go to the following website to download mahout source code
http://www.apache.org/dyn/closer.cgi/lucene/mahout/
select one of the mahout version which are 0.4 , 0.5 ,0.6 .. i had selected 0.6
and make sure that out of so many zipped files in there download the .src zipped file .. this is again the place were many of us make a mistake...
2. check if the folder has a pom.xml or not .. it would be there for sure but make sure u check once.
3. in my case i had Downloaded the file,unzipped it named it as mahout and moved to /usr/local .
following are the commands
ganesh@ganesh-VPCEG2AEN:~$ cd /usr/local/mahout
ganesh@ganesh-VPCEG2AEN:/usr/local/mahout$ mvn install
you should see something as below :
then it would perform the test .. its recommended that the complete tests should be done and the process should be completed for the first time installation.. later when we run the mvn install we use a command to skip the tests which is
mvn install -Dmaven.test.skip=true.
once the tests are done and the mahout is built .. we get a success msg as shown in the figure beow..
Congratulations mahout is installed...
Thanks
ReplyDeleteNice tutorial
Thank you ratn :)
ReplyDeletevery nice tutorial ... thanks
ReplyDeletethank u gaurav :)
ReplyDeletegot following error:
ReplyDeleteNeed help to resolve:
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Internal error in the plugin manager executing goal 'org.apache.mahout:mahout-collection-codegen-plugin:1.0:generate': Unable to find the mojo 'generate' (or one of its required components) in the plugin 'org.apache.mahout:mahout-collection-codegen-plugin'
Bad version number in .class file
Ive successfully installed with the help of this tutorial. But i don't know how to work with this. I couldn't find any good tutorials also. If anyone knows some good site means kindly post the link. Thanks in advance.
ReplyDeleteyou may continue to work on the synetic_control data, 20News examples. Check oout books (e.g. machout in action) for referenece.
ReplyDeleteThis is a nice written tutoril
ReplyDeleteThanks you are the best ;)
ReplyDeletei have installed mahout but cant run any dataset.please help me. do i need to set the mahout_home?????????/
ReplyDeletehow to fix this prblm !!
ReplyDeleteThe goal you specified requires a project to execute but there is no POM in this directory (/usr/local/mahout).
j'ai fixé le probleme, juste download fichier pom from le site originale de mahout :
Deletehttp://www-us.apache.org/dist/mahout/<>/
please tel me how to fix the following error:
ReplyDeleteException in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
I got this error while running LDA clustering. I am not able to get the output. Kindly help me
how can we check that is it working or not
ReplyDeleteGood tutorial ! 10Q!
ReplyDeleteI ve installed the apache, but i stil dont know how to start it, i found that we use mahout spark-shell to get home, please help me to pass into mahout's action.
ReplyDelete