Getting Started with Python Machine Learning
Machine learning and Python
What you will Learn
Introduction to NumPy, SciPy, and Matplotlib
Installing Python
Chewing data efficiently with NumPy and intelligently with SciPy
Learning NumPy
Learning Pandas
Indexing
Handling non-existing values
Comparing runtime behaviors
Learning SciPy
Our frst (tiny) machine learning application Reading in the data
Preprocessing and cleaning the data
Choosing the right model and learning algorithm
Before building our frst model
Starting with a simple straight line
Towards some advanced stuff
Stepping back to go forward – another look at our data
Training and testing
Answering our initial question
Learning How to Classify with Real-world
Examples
The Iris dataset
The first step is visualization
Building our first classification model
Evaluation – holding out data and cross-validation
Table of Contents[ ii ]
Building more complex classifiers
A more complex dataset and a more complex classifier
Learning about the Seeds dataset
Features and feature engineering
Nearest neighbor classification
Binary and multiclass classification
Summary
Clustering – Finding Related Posts
Measuring the relatedness of posts
How not to do it
How to do it
Preprocessing – similarity measured as a similar number
of common words
Converting raw text into a bag-of-words
Counting words
Normalizing the word count vectors
Removing less important words
Stemming
Installing and using NLTK
Extending the vectorizer with NLTK’s stemmer
Stop words on steroids
Our achievements and goals
Clustering KMeans
Getting test data to evaluate our ideas on
Clustering posts
Solving our initial challenge Another look at noise
Tweaking the parameters
Summary