Collection module



Common module operations

##

illustrate:

The collection function of articles is to remotely obtain the content of the target web page through a program, and then store it in the database of the server after parsing and processing local rules.

The article collection system subverts the traditional collection model and process. The collection rules are separated from the collection interface, and the rule setting is simpler. Only personnel with basic technical knowledge can set the relevant rules. Editors do not need to understand too many detailed technical rules. They only need to select the list of articles they want to collect, and then they can easily complete the data collection operation just like publishing articles.
1. Collection process
Simply speaking, there are three steps:
1. Add a collection point and fill in the collection rules.
2. Collect URLs and content
3. Post content to the designated column
Take the collection of Sina News (http://roll.news.sina.com.cn/news/gjxw/gjmtjj/index.shtml) as an example to introduce the detailed process.
Example description:
Goal: Collect Sina news into the International News column of the V9 system.
Target URL: http://roll.news.sina.com.cn/news/gjxw/gjmtjj/index.shtml
1. Add collection point
1.1 URL rule configuration
70.jpg
                                                                                                                                        to add a collection point - URL rule configuration diagram 1
Check the source code of the target URL to be collected and find the start point and end point of the URL to be collected (These two points must be unique in the entire source code). Further narrow the search scope of the collection URL.
71.jpg
                                                                                                                                                                  to add a collection point - URL rule configuration diagram 2
Test whether your URL collection rules are correct, as shown in the figure below
72.jpg
1.2 Content rule configuration
The content rules here seem complicated, but they are actually very simple. For ease of explanation, we only collect two fields: title and content. Collection content URL:
http://news.sina.com.cn/w/2010-12-01/135121565455.shtml content collection rules, please open this URL, and then the page is blank Right-click -> View source file to search the starting boundary of title and content.
Title collection configuration:
Get the title from the web page <title></title> and remove unnecessary characters. As shown below
73.jpg
Content collection configuration:
On the final page of Sina News, the news content is included between <!-- text content begin --> <!-- text content end -->, and these two nodes are in the entire page source code Has uniqueness. So you can use this as a rule to get content. and filter content. As shown below
74.jpg
1.3 Custom rules
1.4 Advanced configuration
You can set whether to download pictures to the server, whether to print watermarks and other configurations.
75.jpg
2. Collect URLs and collect content
After the collection rules are configured, the URL can be collected, and then the content can be collected.
76.jpg
3. Publish content to the designated column
77.jpg
78.jpg
Select the imported column
79.jpg
##
Set the corresponding relationship between the collected content and the fields of the database. Submit the data for storage. Please wait patiently during this period. It will automatically redirect after completion. At this point, a simple collection process is completed.
There are many other features waiting for you to discover.


Operation nameDescription
Collection process details None
Other function descriptionNone