Python Starter Code for Asia Actuarial Analytics Challenge 2016 

AUC 0.63

As most people are new to Machine Learning, in addition to my earlier blog on Getting Started Tips, I have decided to post the following Python script which uses Logistic Regression to make predictions.

You should be able to get auc score of around 0.63, which will put you in approximately 8th position as at the time of writing this blog.

Any comments please post here or in the competition forum.

__author__ = 'Teh Loo Hai'
__website__ = ''

import pandas as pd
import numpy as np
from sklearn import linear_model

if __name__ == "__main__":
    train = pd.read_csv('../input/SAStraining.csv')
    test = pd.read_csv('../input/SAStest.csv')

    # select numeric features
    features = ['time_in_hospital', 'num_lab_procedures',
                'num_procedures', 'num_medications',
                'number_outpatient', 'number_emergency',
                'number_inpatient', 'number_diagnoses']

    # fill nan with 0
    train[features].fillna(0, inplace=True)

    # set random number seed

    # build logistic regression model using numeric features only
    model = linear_model.LogisticRegression()[features], train['readmitted'])

    # make predictions on test data
    preds = model.predict_proba(test[features])

    # create submission file
    submission = pd.DataFrame({'patientID': test.patientID,
                               'readmitted': preds[:, 1]})
    submission.to_csv('submission-logistic.csv', index=False)

Posted by Loo Hai Tuesday, May 24, 2016 2:00:00 PM Categories: Machine Learning SAS
Rate this Content 0 Votes