Machine Learning Syllabus

Getting Started with Python Machine Learning

Machine learning and Python
What  you will Learn
Introduction to NumPy, SciPy, and Matplotlib
Installing Python
Chewing data effciently with NumPy and intelligently with SciPy
Learning NumPy

Learning Pandas
Indexing
Handling non-existing values
Comparing runtime behaviors

Learning SciPy
Our frst (tiny) machine learning application Reading in the data
Preprocessing and cleaning the data
Choosing the right model and learning algorithm
Before building our frst model
Starting with a simple straight line
Towards some advanced stuff
Stepping back to go forward – another look at our data
Training and testing
Answering our initial question

Learning How to Classify with Real-world

Examples
The Iris dataset
The frst step is visualization
Building our frst classifcation model
Evaluation – holding out data and cross-validation
Table of Contents[ ii ]
Building more complex classifers
A more complex dataset and a more complex classifer
Learning about the Seeds dataset
Features and feature engineering
Nearest neighbor classifcation
Binary and multiclass classifcation
Summary

Clustering – Finding Related Posts


Measuring the relatedness of posts
How not to do it
How to do it
Preprocessing – similarity measured as similar number
of common words
Converting raw text into a bag-of-words
Counting words
Normalizing the word count vectors
Removing less important words
Stemming
Installing and using NLTK
Extending the vectorizer with NLTK’s stemmer
Stop words on steroids
Our achievements and goals
Clustering KMeans
Getting test data to evaluate our ideas on
Clustering posts
Solving our initial challenge Another look at noise
Tweaking the parameters
Summary