Data Science
Matlab/Octave NN classifier
Task:
Build a classifier that can find the localization site of a protein in yeast, based on 8 attributes (features).
Solution:
For performing classification there was constructed a 3-layer artifical neural network (ANN) and specially a feed-forward multilayer perception. We have used a stochastic gradient descent with back-propagation to train our ANN.
Cubism.js
Task:
Build simple web solution that displays realtime data using Cubism.js with Google Big Query database as datasource.
Solution:
We have built simple authorization, API for working with BigQuery, front-end controller for managing multiple graphs and simple static cubism.js graphs.
Spark MLLib forest of regressions modelling in AWS
Spam and hyperlink analysis
Task:
Develop Prediction Model for webspam and hyperlink analysis designed and trained (with provided data) to achieve certain prediction goals.
Solution:
We have built model for Spam\Nonspam prediction for links analysis company. We have use Big Data methods for input data size 70+ Gb. There were a lot of text features, which were preprocessed by using TF-IDF, Word2Vec and Features Selecting methods. The columns with date format were changed to timestamp format, and period of page life was extracted. As result we have the percentage prediction for each class: Nospam, Page Spam, Domain Spam.
Prediction model
Task:
Build a simulator for inner use which predicts label of time series data.
Solution:
We created a console script written on R and Bash for production purposes that validates predictive models in specific iterative customer defined way. It comprises iterative data splitting, teaching model, predicting outcomes and evaluation of the model performance.
Classify captchas
Task:
Develop a classifier to identify characters in captchas. Image preprocessing, training, classification had to be done in Python using standard libraries like OpenCV, Scikit-Image, Scikit Learn, etc.
Solution:
We have found wise to use Machine learning, it means to teach the program detect needed letters and numbers and return correct result. We had dataset for training, which contained images with captchas and correct answers. We have used it for training the model. Then we applied the model to test dataset (only images) and have got the string with answers.