Big data (Education) Consultant @ - Developing scripts in python (pandas) and java to extract complex features: time window features, percent/ratio features, canonical features, sparse features for categorical features, polynomial features
- Using under-sampling or over-sampling to fix the highly skewed training dataset
- Using feature selection methods such as fast correlation based filtration and forward selection to reduce the feature dimension
- Applying and tuning algorithms: Linear Regression, Logistic Regression, Decision trees, Random forest, gradient boosting, SVM with kernels, Naivebayes, ensemble methods: bagging and boosting, JRip, K-star
- Building machine learning pipeline in python (sklearn) to select the best model using grid search and randomized search
- Validating the models using 3-way split and using appropriate metrics : F1-score, Kappa statistic, precision, recall, AUC, logloss, depending the distributions of the target variables and nature of algorithm
- Developing automated Rapidminer scripts in Java to apply sensor-free student affect (boredom, confusion, frustration & concentration) detectors to large scale datasets
- Applying disengagement behavior detectors (gaming the system and off-task) to large scale datasets From November 2014 to Present (1 year 2 months) Research Scientist @ metacog is an internet scale educational data platform built to ingest data from hundreds of learning objects (simulations and game-based learning objects) used by millions of students. The platform is a fault-tolerant, distributed system built on AWS platform that supports both batch and real-time educational data processing with reporting and analytics capabilities.
- Developed code in spark mllib to generate decision tree models using fast correlation based feature selection and used cross-validation to test the goodness of the models
- Developed scala spark code to extract meaningful generalizable features to increase the accuracy of the real-time automated scoring models
- Implemented Kmeans clustering in Spark using bayesian information criterion (BIC) to find optimal K
- Worked on developing data pipe line for real time auto-scoring models using Kinesis and Spark-streaming as realtime layer, Spark MLlib as machine learning library and dynamoDB as intermediate storage
- Designed and developed batch visualization layer using Kinesis, Spark-batch and S3
- Involved in test-driven development and in reviewing the code of other team members
- Involved in the design discussions during the selection an AWS technology based on cost vs. availability vs. scalability comparisons.
- Adopted agile-SCRUM process using Rally software and increased the productivity of the team tremendously From September 2014 to Present (1 year 4 months) Assoc. Research Technologist @ From May 2013 to August 2014 (1 year 4 months) Research Programmer @ From January 2010 to April 2013 (3 years 4 months) Research Assistant @ From January 2009 to December 2009 (1 year) Software Engineer @ From August 2005 to August 2008 (3 years 1 month) Bengaluru Area, IndiaStudent - trainee @ From January 2005 to June 2005 (6 months)
Master's, Computer Science @ Brandeis University From 2008 to 2009 Bachelor of Engineering, Mechanical Engineering 2005 @ BMS College of Engineering, Bangalore From 2001 to 2005 Sujith Gowda is skilled in: Machine Learning, Java, Databases, Eclipse, Python, Agile Methodologies, Linux, Algorithms, Scala, Apache Spark, Data Mining, SQL, Software Design, C, Programming, XML