Machine learning with Python

With the scikit-learn toolkit, simple machine learning in Python is really easy.  This example is for binary classification, where the training data is stored in a single CSV file. The first entry in each line is the class, and the remaining entries are the features. I’m using Python’s csv module to read in the data.

import sys, csv
from sklearn import tree

def main(argv):
    # Open the input data file
    f=open('train.data')

    # Initialise arrays for classes and features
    classes = []
    features = []

    # Instantiate a CSV reader
    reader = csv.reader(f)

    # Extract class and features into relevant arrays
    for row in reader:
        classes.append(row[0])
        features.append(row[1:])

    # Initialise classifier as a decision tree. Just by 
    # changing this line, you can use different classifiers
    clf = tree.DecisionTreeClassifier()

    # Train classifier
    clf.fit(features,classes)

    # Predict class (on training data)
    prediction=clf.predict(features)

    # Close data file
    f.close()

if __name__ == "__main__":
    main(sys.argv[1:])
Advertisements
This entry was posted in Technology and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s