Archive for : July, 2020

Data Mining 数据挖掘

在数据分析科学里,有一个很重要的概念,叫做数据挖掘。可以这么想象,一个机构利用数据科学做分析和决策的过程就是发现含有钻石原矿,再将未加工钻石经过多道工艺层层打磨的过程。数据挖掘就是这么一个系统性的过程,它需要应用到信息科学中的自动检索和模式分析,也可能会要求分析师有相应的商业知识或者创造力以及常识。

An important concept of data science is data mining. You could compare the process that an institute adopt data science analysis for decision making as the process of diamond mining and processing, where you explore the rough diamonds and polish them via multiple complex processes. Data mining is such a systematic process, where it involves the application of information technology such as the automated discovery and evaluation of patterns from data and it may requires an analyst’s creativity, business knowledge and commen sense.

每一个商业决策问题都是独特的,都有它特定的目标,限制和特征。我们面对一个商业问题,要解决它也可以运用工程学思维,把一个商业问题解构成 子任务集。这些子任务中,有些是特殊的商业问题,还有一些是普遍的数据挖掘任务。需要注意的是,在数据科学中一个很重要的技能就是将数据分析问题分解成子问题然后根据各个子问题来找相应的解决方案。 所以在学习数据挖掘的具体流程之前,我们要谈一谈数据挖掘的几种常见的任务类型以便之后对数据挖掘的整个过程以及其概念的具化了解。

Every business decision making problem is unique, comprising its own combinations of goals, constraints and characteristics. We could proceed a business problem by adopting engineering approach, that is, to decompose a business problem into subtasks. Among these subtasks some are unique to business problems and some are common data mining tasks. Note that a critical skill in data science is the ability to breakdown a data analytics problem into parts such that each part matches a known task for which the solutions are available. So before learning about the data mining process, it is useful to discuss about the common types of data mining tasks, which allow us to be more concrete when we are presented the data mining processes and concepts.