Major Issues in Data Mining:
Mining different kinds of knowledge in databases. – The need of different users is not the same. And Different users may be interested in different kinds of knowledge. Therefore it is necessary for data mining to cover a broad range of knowledge discovery tasks.
Interactive mining of knowledge at multiple levels of abstraction. – The data mining process needs to be interactive because it allows users to focus the search for patterns, providing and refining data mining requests based on returned results.
Incorporation of background knowledge. – To guide the discovery process and to express the discovered patterns, background knowledge can be used. Background knowledge may be used to express the discovered patterns not only in concise terms but at multiple levels of abstraction.
Data mining query languages and ad hoc data mining. – Data Mining Query language that allows the user to describe ad hoc mining tasks, should be integrated with a data warehouse query language and optimized for efficient and flexible data mining.
Presentation and visualization of data mining results. – Once the patterns are discovered it needs to be expressed in high-level languages, and visual representations. This representation should be easily understandable by the users.
Handling noisy or incomplete data. – The data cleaning methods are required that can handle the noise, and incomplete objects while mining the data regularities. If data cleaning methods are not there then the accuracy of the discovered patterns will be poor.
Pattern evaluation. – It refers to the interestingness of the problem. The patterns discovered should be interesting because either they represent common knowledge or lack novelty.
Efficiency and scalability of data mining algorithms. – To effectively extract information from huge amounts of data in databases, data mining algorithms must be efficient and scalable.
Parallel, distributed, and incremental mining algorithms. – The factors such as the huge size of databases, wide distribution of data, and complexity of data mining methods motivate the development of parallel and distributed data mining algorithms. These algorithms divide the data into partitions which are further processed in parallel. Then the results from the partitions are merged. The incremental algorithms update databases without having mine the data again from scratch.