Jump to ratings and reviews
Rate this book

Data Mining: Practical Machine Learning Tools and Techniques

Rate this book
Data Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website. It contains

654 pages, Paperback

First published October 11, 1999

123 people are currently reading
1226 people want to read

About the author

Ian H. Witten

17 books9 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
242 (30%)
4 stars
281 (35%)
3 stars
212 (27%)
2 stars
38 (4%)
1 star
11 (1%)
Displaying 1 - 30 of 40 reviews
Profile Image for Todd N.
357 reviews255 followers
September 25, 2013
This is an excellent, but somewhat uneven, introduction to the field of machine learning, divided into three parts.

Part 1 is a good overview of the types of use cases, standard data sets, and algorithms. It provides more intuitive rather than technical explanations, though there is some math to get through. Reading just this section will definitely get you through any dinner party conversations about machine learning. I read through this twice, taking careful notes in my Moleskine (natch) the second time through.

Part 2 dives into the algorithms in more technical detail. This is the weakest part of the book because sometimes the algorithm is expressed in a “pseudo code” and sometimes it’s just described with text. It’s still valuable to read through but it would benefit a lot from a consistent treatment. My notes from this part are proving valuable as I read mathematically more rigorous books covering the same topics.

Part 3 is essentially a users guide for an open source machine larding workbench called Weka. I wasn’t familiar with this tool before, despite working closely with Pentaho in a previous job. I downloaded some data from a Kaggle data science competition and loaded it into Weka, and within a few minutes I was already beating the posted benchmark from their leaderboard. Some of the options and algorithms available in Weka aren’t going to make any sense until you have read through this. Because Weka comes with the sample data sets mentioned in Parts 1 and 2, it’s definitely worth firing it up and playing around with the data as you read.

The way that I worked through this book is as follows: Part 1 (skimmed), download and play with Weka, Part 3 (read carefully), Part 1 (taking notes), Part 2 (taking notes), Download your own data or Keggle data into Weka and start applying the algorithms.

Definitely recommended. I was surprised by the bad reviews posted by my Goodreads peers.

After this book I’m working through The Elements of Statistical Learning, which I found at the UC Berkeley student book store for one of their ML classes.
35 reviews2 followers
January 6, 2008
From the perspective of a computer scientist, this book is basically totally useless, as it leaves the reader with no idea how any of the algorithms really work. It might be helpful if you want to be able to use some machine learning software while avoiding having anything more than a cursory understand of how it works.
Profile Image for Vinicius.
3 reviews1 follower
July 11, 2019
A good introduction to data mining and machine learning tools.
The algorithms are not in-depth detailed.
The third part of the book is a WEKA user guide. Combined with some useful datasets as the UCI ones, it is a good set to start learning data mining.
Profile Image for JDK1962.
1,421 reviews20 followers
August 14, 2013
I really, really wanted to like this book more than I did. After all, it was about a topic that I have great interest in, and describes a workbench application (Weka) that I can command-line access from my favorite programming environment (R, via RWeka).

The problem I was having with it is that its presentation, across the board, was incredibly wordy. They managed to make the interesting sound boring, and presented technical material with no grace whatsoever. The chapter on the Weka Explorer was a case in point: page after page where they would throw out names of an algorithm or a filter or whatever, and a sentence or a paragraph on what it did, then on to the next one. Now, come on guys: what do you expect a reader to get out of section after section of this sort of thing? If it's meant as a reference, write it as a reference. Break it into multi-column lists. Heck, just include the Javadoc page, I don't care. But as textual paragraphs? It was as interesting as reading a set of dictionary entries (Pi-Po, for example), all squashed into a paragraph, words mixed with etymology, mixed with definitions. It's not terrible useful as a reference, and as it a reading experience, it--frankly--sucks.

So, I ask again of the authors: what do you want the reader to take away (remember) from the section? Write that. As clearly as possible. With all the examples and illustrations that are necessary to make it clear. Are you writing a reference section? That's fine, but make it clear that's what you're doing, move it in an appendix, whatever.

As a third edition, I expected better from this, but it showed little refinement over the previous edition. It was just...bigger. I confess that I skimmed in places. It was just too damn tedious.

William's book--Data Mining with Rattle and R--set itself much the same task of introducing data mining and showing off a workbench, but did a much better job in a third as many pages.
Profile Image for Kai Weber.
519 reviews46 followers
July 9, 2017
This book covers data mining techniques that were developed within the study field of machine learning. It starts with explaining how to represent input and output data and then progresses from simpler, basic algorithms (e.g. naïve Bayes, decision trees, rule inference, instance-based learning, clustering) to more advances techniques (e.g. C4.5, hyperplane margins, neural networks, advanced probabilistic methods, deep learning. Along the way it also covers evaluation of what's been learned by a machine, data clean-up and combining several learning methods together.
The book is extensive. The largest chapters (nos. 9 & 10 on advanced probabilistic methods and deep learning) are mathematically very challenging, maybe a bit too much so in an introductory work. When directly comparing the description of the methods here with the way they're treated in Artificial Intelligence: A Modern Approach, I felt I grasp them better in the Russell/Norvig way.
Nevertheless, another advantage of "Data Mining" is that the team of authors has implemented all the algorithms of the book in a software suite called WEKA, which is introducted in the appendix. That's a good bait for the readers of this book to download WEKA and try things out immediately.
1,000 reviews20 followers
December 23, 2011
A useful compendium of data mining techniques and accompaniment to the Weka data mining tool. This book is more an overview than a detailed treatise: there are descriptions but few precise algorithms; the maths is kept to a minimum and, where there is maths, it is often left mostly unexplained; the applications seem dated - there's little on mining large-scale scientific, medical or web data, for example; and issues of handling large scale data are skirted. Nevertheless, its scope is wide and it's a useful introduction to the field.
Profile Image for John Orman.
685 reviews33 followers
June 20, 2012
This big book has many sections that I used for my current Machine Learning online class: Applications, Knowledge Representation, Algorithms, Linear/Logistic Regression, Prediction, Classification, Clustering, and Cost Calculation. It also introduced me to the WEKA machine learning workbench, a set of free software tools that can be downloaded to implement many of the algorithms used in machine learning.
Profile Image for Joe Cole.
169 reviews349 followers
May 15, 2017
This is latest edition for data mining. I like this book because if provide practical examples for machine learning.
193 reviews45 followers
December 8, 2015
I’ve been delaying picking up a proper data science book for a couple of years now and finally ran out of excuses not to do it. These days any moderately serious conversation/book about areas that I tend to follow - genetics, genomics, economic development, history, consciousness, prediction, uncertainty - requires a minimum grounding in statistics and/or machine learning. Thus, when a couple of weeks ago I had to look something up for a little work project I took the opportunity to read most of the book as a general edification exercise. It is certainly a nice mix of stats and machine learning – the latter is sexier of course and gets all the headlines but the former is where real intuitions about underlying techniques can be developed.

Yes I liked the book, and yes I’m glad I learned a few new things, and, for the most part, the book didn’t alter my expectations about statistics (vital science, prone to manipulation), however once you fully witness how the sausage is made in machine learning land you will not be setting your foot in google’s self-driving car anytime soon. And to Mr. Kurzweil I have one message – singularity is FAR.
Profile Image for Robert Muller.
Author 15 books35 followers
June 2, 2013
While this book is an excellent overall summary of data mining technology, and it's an indispensable reference for using the Weka data mining software, it is not detailed enough, nor does it have enough examples, for an otherwise inexperienced novice data miner to be effective. If you come at it knowing a lot about statistics, probability, and modeling, you can get your knowledge rounded out with techniques and ideas you may not have experienced but make sense to you. If you don't bring such knowledge, you will almost certainly be mystified or misled by the level of description of techniques in this book.

It's also important to note that this book focuses on data mining rather than machine learning. I suspect it comes from the era of the 2000s with its emphasis on the leading edge of Big Data. People who are more interested in the more complex modeling and AI-based techniques of machine learning won't get as much out of this book as marketing people interested in analyzing sales data.
Profile Image for Kid.
23 reviews22 followers
June 21, 2014
Best introductory book on Data Mining in terms of concepts and practice. Not too academically but goal-driven and data-driven, which makes readers understand it easier.

WEKA is a great tool, although its part in this book is a little bit too much.

For those who needs more on theory perspective, I recommend the book "Introduction to Data Mining" (Pang-Ning Tan, Michael Steinbach, Vipin Kumar). But if you want to apply it on business without knowing a lot of mathematical backgrounds, you can look for "Data Science for Business: What you need to know about data mining and data-analytic thinking" (Foster Provost, Tom Fawcett). Actually I have very narrow understanding in business domains, so this book helps me a lot.

My final thoughts? Get all if possible. This is a very good book. I enjoy it so much.
Profile Image for Avinash K.
182 reviews31 followers
December 15, 2016
Very good first book and a very good reference book. This is very rare achievement especially for something that should be accessible for persons with minimal background in data mining.
Pedagogy is just right and writing style is very lucid.
Excellent book and like Prof. Witten's lectures very interactive Highly recommend it.
Profile Image for Brett Dargan.
8 reviews5 followers
November 29, 2010
Loved this book. Although some parts were too slow, especially the first few chapters. Took a long time to explain concepts that could have been reduced a lot.
It is well worth sticking with it though; learnt some important concepts about data structures I hadn't come across before.
Profile Image for Ayman Sieny.
13 reviews3 followers
February 4, 2011
The book provides a good introduction to data mining algorithms including classification, clustering and association. It also provides practical hands-on exercises using an open source data mining tool developed by the authors called WEKA.
60 reviews3 followers
September 19, 2009
Pedantic to a fault. Otherwise, it's just a bunch of algorithms with analysis and discussion.
Profile Image for Bill Hayes.
44 reviews1 follower
Currently reading
May 13, 2012
I like his stated approach to give readers a good feel for the different techniques of Machine Learning and what they can be used for.
Profile Image for Kurt.
72 reviews1 follower
Want to read
July 22, 2011
Very mathy and deep, but also seems very practical and actionable so far.
Profile Image for Chris.
55 reviews1 follower
November 12, 2011
I was looking for something not so theoretical, which is totally what it was to me. Practical to me means something with code...
1 review
Read
June 27, 2013
C
Cool book ,it makes you aware of mathematical inventions not too far fetched from wanting to incorporate only to know after that there was already such an invention. haven't completed
This entire review has been hidden because of spoilers.
49 reviews10 followers
October 31, 2013
Very hands on/practical intro to the subject. For readers who want to start using ML techniques quickly and worry about theoretical considerations later.
Displaying 1 - 30 of 40 reviews

Can't find what you're looking for?

Get help and learn more about the design.