Data Mining
Data Mining - Study Guide

Data Mining

    Study Guide

    1. What is Data Mining?

    Wikipedia defines data mining as follows:
    "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data."

    In our modern world where we have seemingly endless amounts of data being stored electronically, it makes sense that we have the desire to analyze this data in an effort to uncover meaningful patterns hidden within the data. Thus the latter of the above definitions with the stipulation that the patterns be "potentially useful" seems most appropriate.

    Also, should you desire to read up on data mining techniques, refer to Statistical Data Mining Tutorials by Andrew Moore. -

    2. Why should we mine data? Why should we not?

    The Internet Marketing Engine lists better knowledge of buyer behaviour as one of the key benefits of data mining. According to Mitch Kramer and his article Data Mining at Work: Predicting and Preventing Terrorism we may be able to stop the next big terrorist attack before it happens by exploring data mining techniques.

    An interesting application of data mining is to lower costs as well. Refer to Data Mining and Customer Relationships by Kurt Thearling for applications related to businesses and A Pill, A Scalpel, A Database by Marianne Kolbasuk for applications by consumers. -

    There are basically two arguments against data mining. The first is that the technology is not ready, and that it makes associations that are not there, as detailed in this article. The second is that it invades our privacy, that no one has the right to mine "our" data. -

    3. What about privacy?

    With a better understanding of what data mining is, why it is used, and who is using it we can discuss the ethical implications of data mining. With regards to data mining, privacy is the issue that orginizations such as the Electronic Privacy Information Center and the Electronic Frontier Foundation are most concerned with. Is it ethical for a company to share its data with another company in order to better understand its customers? Should the government be able to access citizens data in an effort to curb terrorism? Should web sites use information obtained from its visitors as a new way to increase profits by selling this information to others or by targeting return visitors with information they obtained on previous visits?

    Which of the following pieces of information listed as visitor characteristics do you want companies and government agencies to be distributing and mining?

    4. Who should mine data?

  • Government Data Mining:

  • In the above example given about using data mining to combat terrorism we must assume that the government is in charge of the data mining efforts. The now famous Total Information Awareness program was one such effort by the United States government to create a data mining architecture to be employed in the pursuit of terrorists.

  • Business Data Mining:

  • According to SAS any company with data to be mined should be mining data. On their data mining website they list the benefits of data mining that companies can use to "...reduce fraud, anticipate resource demand, increase acquisition and curb customer attrition."

    5. So what is the central issue?

    I could be summed up as a debate on privacy. That is where most of the objections lie, and the benifits of data mining are clear.

    Data Mining Main Page Previous Study Guide