> 데이터 베이스 > MySQL 튜토리얼 > 在Eclipse中运行Nutch2.3

在Eclipse中运行Nutch2.3

WBOY
풀어 주다: 2016-06-07 15:07:09
원래의
1324명이 탐색했습니다.

参考http://wiki.apache.org/nutch/RunNutchInEclipse 一、环境准备 1、下载nutch2.3源代码 wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz 或者下载正在开发中的最新版本 svn co https://svn.apache.org/repos/asf/nutch/bra


参考http://wiki.apache.org/nutch/RunNutchInEclipse


一、环境准备

1、下载nutch2.3源代码

1

wget http://mirror.bit.edu.cn/apache/nutch/2.3/apache-nutch-2.3-src.tar.gz

로그인 후 복사
或者下载正在开发中的最新版本

1

svn co https://svn.apache.org/repos/asf/nutch/branches/2.x

로그인 후 복사


2、选择使用的数据库类型,以hbase为例
在conf/nutch-site.xml中增加以下属性:

1

2

3

4

5

<property>

  <name>storage.data.store.class</name>

  <value>org.apache.gora.hbase.store.HBaseStore</value>

  <description>Default class for storing data</description>

 </property>

로그인 후 복사


3、在ivy/ivy.xml中增加与hbase相关的依赖项,此项本已存在,但被注释掉,将注释去掉即可

1

<dependency org="org.apache.gora" name="gora-hbase" rev="0.5" conf="*->default” />

注意,rev=0.5对应hbase0.94,rev=0.3对应hbase0.90.4


4、在nutch.xml中增加以下3个属性

1

2

3

4

5

6

7

8

9

10

11

12

<property>

   <name>http.agent.name</name>

   <value>My Nutch Spider</value>

 </property>

<property>

   <name>http.robots.agents</name>

   <value>none</value>

 </property>

<property>

   <name>plugin.folders</name>

   <value>/Users/liaoliuqing/0_Search/1_Nutch/1_Official/apache-nutch-2.3/build/plugins</value>

 </property>

其中plugin.folders的值为$NUTCH_HOME/build/plugins


5、执行ant eclipse


二、导入project

1、导入project

在Eclipse中运行Nutch2.3


三、运行程序

1、Run as ----> Run configuration,选择project与主类

在Eclipse中运行Nutch2.3

2、填写参数

/Users/liaoliuqing/Downloads/seed.txt

-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log

在Eclipse中运行Nutch2.3

3、点击run,输出结果如下:

InjectorJob: starting at 2015-01-28 16:27:43
InjectorJob: Injecting urlDir: /Users/liaoliuqing/Downloads/seed.txt
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2015-01-28 16:27:47, elapsed: 00:00:04


注意,在运行程序前,本机需要先启动hbase。


4、查看hbase中的数据

1

2

3

4

5

6

7

8

9

hbase(main):003:0> scan 'webpage'

ROW                                         COLUMN+CELL                                                                                                                

 com.163.www:http/                          column=f:fi, timestamp=1422433667377, value=\x00'\x8D\x00                                                                  

 com.163.www:http/                          column=f:ts, timestamp=1422433667377, value=\x00\x00\x01K/\xA7:\x14                                                        

 com.163.www:http/                          column=mk:_injmrk_, timestamp=1422433667377, value=y                                                                       

 com.163.www:http/                          column=mk:dist, timestamp=1422433667377, value=0                                                                           

 com.163.www:http/                          column=mtdt:_csh_, timestamp=1422433667377, value=?\x80\x00\x00                                                            

 com.163.www:http/                          column=s:s, timestamp=1422433667377, value=?\x80\x00\x00                                                                   

1 row(s) in 0.2970 seconds

로그인 후 복사






관련 라벨:
원천:php.cn
본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.
인기 튜토리얼
더>
최신 다운로드
더>
웹 효과
웹사이트 소스 코드
웹사이트 자료
프론트엔드 템플릿