楚雄师范学院学报 ›› 2020, Vol. 35 ›› Issue (3): 115-119.

• 计算机 • 上一篇    下一篇

基于Weka平台的文本分类实验研究

李梅*   

  1. 淮南联合大学 信息工程学院,安徽 淮南 232001
  • 收稿日期:2019-12-17 出版日期:2020-05-20 发布日期:2020-12-28
  • 通讯作者: *李梅(1981-),女,硕士研究生,淮南联合大学信息工程学院讲师,主要研究方向:软件技术。E-mail:365478641@qq.com
  • 基金资助:
    安徽省高等学校省级自然科学研究项目(NO:KJ2019A0456);安徽省高等学校省级自然科学研究项目(NO:KJ2019A0664);安徽省高等学校省级自然科学研究项目(NO:KJ2017A585)

Experimental Research on Text Classification Based on Weka Platform

LI Mei   

  1. School of Information Engineering,Huainan Union University,Huainan,Anhui Province 232001
  • Received:2019-12-17 Online:2020-05-20 Published:2020-12-28

摘要: 文本分类的分类算法常用 J48算法、Naive Bayes Multinomia算法和SMO算法,利用Weka平台选择路透社的数据集进行分类实验,根据查准率、查全率和F-Measure综合指标结合其他文本分类评价指标分析六次实验得到的结果,得出SMO算法优于其他两个算法。针对选择的Naive Bayes Multinomia算法,调整了numToSelect值,对其结果进行了优化。以此实验为文本分类研究工作提供参考。

关键词: 文本分类, J48算法, Naive Bayes Multinomia算法, SMO算法, Weka

Abstract: On the basis of introducing the commonly used J48 algorithm,Naive Bayes Multinomia algorithm and SMO algorithm to the classification algorithm selection of text categorization,we use Weka platform to select data sets for classification experiments.According to the precision,recall and index combined with other text classification evaluation indexes,we analyze the results obtained from the six experiments,and conclude that SMO algorithm is better than the other two algorithms.For the selected Naive Bayes Multinomia algorithm,the numToSelect value is adjusted and its results optimized.This experiment provides some references for the research of text categorization.

Key words: text classification, J48 algorithm, Naive Bayes Multinomia algorithm, SMO algorithm, Weka

中图分类号: