javascript - Problem with nodejs crawling web pages
阿神
阿神 2017-05-16 13:43:09
0
2
655

I plan to use nodejs to capture all the news on the following website. According to the general idea, first get the URL of each page of news, and then get the URL of each news
Use request to get the content of each URL Just take it off and it's OK.

But all the paging information of the following URL, as well as the URL of each news clicked into, have not changed. It seems that they are all implemented through js in the background.
I can’t view it even with the newwork tab of F12 in chrome. If you have any requests, can any expert guide me how to capture it?

http://www.xxxxxxxxx.com/glob...

阿神
阿神

闭关修行中......

reply all(2)
阿神

1. As you can see from the previous and next articles, the function bound to click: boardView(1);

2. Find the corresponding function in the page through boadrview:

function boardView(idx){
  var listNum = 10; // 목록 수 지정
  
  var resultLenplistNum = Math.floor(idx/listNum); // 결과 나누기 목록수
  var resultLenRestlistNum = Math.floor(idx%listNum); // 결과 나머지 목록수
  if (resultLenRestlistNum == 0){
    pageNum = resultLenplistNum;
  } else {
    pageNum = resultLenplistNum + 1;
  }
  
  cmsView.style.display = 'block';
  cmsList.style.display = 'none';
  resultViewStr = '<p class="news_view"><p class="news_hd">';
  resultViewStr = resultViewStr + '<strong>'+list.artCatTitles[resultSearch[idx]] +'</strong>';
  resultViewStr = resultViewStr + '<p>'+list.artTitles[resultSearch[idx]]+'</p>';
  resultViewStr = resultViewStr + '<span>'+list.artTimes[resultSearch[idx]]+'</span></p>';
  resultViewStr = resultViewStr + '<p class="news_bd">'+list.artTexts[resultSearch[idx]];
  resultViewStr = resultViewStr + list.artFiles[resultSearch[idx]]+'</p>';
  resultViewStr = resultViewStr + '<p class="news_link"><ul>';
  resultViewStr = resultViewStr + '<li><strong><span></span>';

…………

3. See that the data comes from the variable list, and then look for list

4. See at line 1739:

var artId = "";
var catId = "se14_24";
var tplId = "";
list = new jsList();
list.cmsInit(catId, artId, tplId, new data()); // list 객체 생성

5. A constructor is called: jsList() and the corresponding code is found here: http://www.samsungsem.com/js/...

6 Look back at the code in step 2: list.artTitles-->These data are set through the cmsInit method of jsList, and in cmsInit:

function cmsInit(catId, artId, tplId, data) {

    this.artIds = data.artIds;
    this.artCatTitles = data.artCatTitles;
    this.artTitles = data.artTitles;
    this.artUrls = data.artUrls;
    this.artTimes = data.artTimes;
    this.artImgs = data.artImgs;
    this.artTexts = data.artTexts;
    this.artTexts2 = data.artTexts2;
    this.artKeywords = data.artKeywords;
    this.artFiles = data.artFiles;
The data of

...
comes from the fourth parameter data

7. Look at the data passed in step 4 which is new data()
So, we find where the data function is defined.
Look up and find: <script src="/global/news/data.js.jsp"></script>

8. Open it and take a look: http://www.samsungsem.com/glo...
It feels so strange. Why is it so strange?

Right-click to view the source code:
view-source: http://www.samsungsem.com/glo...
You can see that the data function is defined here, and the data you see is also on this page.

过去多啦不再A梦

Thanks for the answer, I’ll go take a look first...

I basically understand it. There are still some things that I don’t understand very well. I’ll take my time to look at it. Thank you very much..

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template