### Python Starter Code for Asia Actuarial Analytics Challenge 2016

AUC 0.63As most people are new to Machine Learning, in addition to my earlier blog on Getting Started Tips, I have decided to post the following Python script which uses Logistic Regression to make predictions.

You should be able to get auc score of around 0.63, which will put you in approximately 8th position as at the time of writing this blog.

Any comments please post here or in the competition forum.

`__author__ = 'Teh Loo Hai'`

__website__ = 'www.actuaries.com.my'

`import pandas as pd`

import numpy as np

from sklearn import linear_model

`if __name__ == "__main__":`

train = pd.read_csv('../input/SAStraining.csv')

test = pd.read_csv('../input/SAStest.csv')

` # select numeric features`

features = ['time_in_hospital', 'num_lab_procedures',

'num_procedures', 'num_medications',

'number_outpatient', 'number_emergency',

'number_inpatient', 'number_diagnoses']

` # fill nan with 0`

train[features].fillna(0, inplace=True)

` # set random number seed`

np.random.seed(4321)

` # build logistic regression model using numeric features only`

model = linear_model.LogisticRegression()

model.fit(train[features], train['readmitted'])

` # make predictions on test data`

preds = model.predict_proba(test[features])

` # create submission file`

submission = pd.DataFrame({'patientID': test.patientID,

'readmitted': preds[:, 1]})

submission.to_csv('submission-logistic.csv', index=False)

### Asia Actuarial Analytics Challenge 2016

- Getting Started TipsSingapore Actuarial Society (SAS) has recently launched the above competition to promote development of data analytics talent in Asia. If you don't know how to get started, the following are some tips:

- You need to have an invitation link before you can participate. You can find the invitation link in our April 2016 newsletter. Not sure whether you are eligible to participate, check the competition forum and if still unsure, ask the admin.
- Submit an all zeros submission by downloading and submitting the sample submission file. There you have it, you should achieve a score of 0.50000 and at par with the benchmark.
- Not happy with your score? Use a random number generator to generate your predictions. Submit your predictions and you should get a score either higher or lower than 0.50000. If you get a score higher than 0.50000, congratulations, you have beaten the benchmark! If your score is lower than 0.50000, just change your previous predictions by subtracting each one of them from 1 and submit again. Amazing, now you have outperformed the benchmark.
- Try something more actuarial. Fit a least squares regression line (e.g. using Excel) with "readmitted" as your y variables and choosing say "time_in_hospital" as your x variables. Use your regression line to make predictions and submit them.
- Improve your model by trying multiple regression (can still use Excel).
- Do more advanced stuff like GLM.
- Sign up for a machine learning course like the one run by SAS.

Good luck!