Journal of Chuxiong Normal University ›› 2020, Vol. 35 ›› Issue (6): 124-131.

• Computer Science • Previous Articles     Next Articles

A Focused Crawler Method of Configurable Theme Based Heritrix

WANG Song, LIU Hongji, YE Xiaobo   

  1. School of Economics & Management, Chuxiong Normal University, Chuxiong, Yunnan Province 675000;
    Dept. of Management of State-owned Assets and Informationalization, Chuxiong Normal University, Chuxiong, Yunnan Province 675000
  • Received:2020-05-06 Online:2020-11-20 Published:2021-03-29

Abstract: During the time of development of the Internet, massive information was generated in the cyber-world and has become an important asset. Meanwhile users’ requirement on information search has become higher and higher. How to search key information quickly and effectively is one of the most difficult problems to solve. Basically, the search engine satisfies needs in data searching. However, needs of users only focusing on special themes or fields cannot be satisfied. Through searching key words only is hard to describe their needs or their problems. Thus, this study focuses on data mining and machine learning and proposes a crawler method of configurable theme focused on crawler system that is based on open-source framework of web crawler Heritrix. To a certain extent this method can solve the above mentioned problems and improve users’ perception and searching efficiency.

Key words: focused crawler, configurable theme, Heritrix

CLC Number: