本书是面向高等院校计算机相关专业的机器学习教材。全书以机器学习应用程序的开发流程为主线,详细介绍数据预处理和多种算法模型的概念与原理;以Python 和Spark 为落地工具,使读者在实践中掌握项目代码编写、调试和分析的技能。本书最后两章是两个实战项目,举例讲解机器学习的工程应用。本书内容丰富、结构清晰、语言流畅、案例充实,还配备了丰富的教学资源,包括源代码、教案、电子课件和习题答案,读者可以在华信教育资源网下载。
孙立炜,厦门南洋职业学院大数据技术教研室主任。解放军电子工程学院信号与信息处理专业硕士研究生,大数据高级分析师。主要研究方向为数据挖掘、Hadoop大数据技术。在CN刊物公开发表论文20篇,主编教材1部,主持申报并获得软件著作权4项,主持市级以上科研课题3项,主持精品课程项目1项。
第 1 章 机器学习技术简介 ···············································································1
1.1 机器学习简介 ·······················································································1
1.1.1 机器学习的概念············································································1
1.1.2 机器学习的算法模型······································································1
1.1.3 机器学习应用程序开发步骤·····························································2
1.2 机器学习的实现工具 ··············································································3
1.3 Python 平台搭建 ····················································································3
1.3.1 集成开发环境 Anaconda ··································································4
1.3.2 集成开发环境 PyCharm···································································7
1.3.3 搭建虚拟环境············································································.10
1.3.4 配置虚拟环境············································································.13
1.4 Spark 平台搭建···················································································.17
1.4.1 Spark 的部署方式·······································································.17
1.4.2 安装 JDK··················································································.18
1.4.3 安装 Scala·················································································.21
1.4.4 安装开发工具 IDEA ····································································.22
1.4.5 安装 Spark ················································································.24
1.4.6 安装 Maven···············································································.25
1.5 基于 Python 创建项目 ··········································································.27
1.6 基于 Spark 创建项目············································································.29
习题 1 ·····································································································.32
第 2 章 数据预处理 ·····················································································.34
2.1 数据预处理的概念 ··············································································.34
2.1.1 数据清洗··················································································.34
2.1.2 数据转换··················································································.35
2.2 基于 Python 的数据预处理 ····································································.37
2.3 基于 Spark 的数据预处理······································································.43
习题 2·······························································································.46
第 3 章 分类模型 ························································································.48
3.1 分类模型的概念 ·················································································.48
3.2 分类模型的算法原理 ···········································································.51
3.2.1 决策树算法···············································································.51
3.2.2 最近邻算法···············································································.56
3.2.3 朴素贝叶斯算法·········································································.58
3.2.4 逻辑回归算法············································································.59
3.2.5 支持向量机算法·········································································.59
3.3 基于 Python 的分类建模实例 ·································································.60
3.4 基于 Spark 的分类建模实例···································································.63
习题 3 ·····································································································.67
第 4 章 聚类模型 ························································································.70
4.1 聚类模型的概念 ·················································································.70
4.1.1 聚类模型概述············································································.70
4.1.2 聚类模型中的相似度计算方法·······················································.71
4.1.3 聚类算法的评价············································