Özet:
The thesis work presents the study carried out on a relatively small dataset -with 1099 samples and 20 attributes- obtained from hospital records in Hungary. Various data analysis methods have been applied and their advantages and shortcomings have been presented. It goes to prove that using a tuned Support Vector Machine model brought in better predicting results in terms of accuracy and compatible cost to a classification problem when compared to Neural Nets, Random Forest or the Decision Tree models. The thesis makes use of data analysis methods developed in the software package R and compares the forecasting and performance of vastly used technologies in prediction problems including ANNs, RF and SVM models. The results show that taking SVM into consideration while doing predictions for medical diagnoses and other types of applications has been effective. Generally depending on the dataset and the task in hand, one must try various methods before settling on the model that serves the task best. With the results obtained from this thesis work and further similar work, hospital staff can be equipped better to deal with admitted patients as they can be more informed about the patients' conditions and can have some predictions about their conditions, which can play a vital role in the patient's survival and recovery. As it is about human lives, a small mistake can be hideously risky, thus more and more accuracy and cost effectiveness are required when modelling and predicting. In this thesis work highly accurate predictions were obtained when using the SVM model. Moreover, an effective performance measured by the ROC has been reached with the SVM model. So for the given task the SVM model proved to be better than ANNs, RF or even DT models. There is another important issue to check before jumping into the modeling which is the software package that should be used. The thesis work has been done using the R language on R Studio interface. However, Matlab and Python can be used. The thesis work presents R as an adequate language and RStudio as an effective interface.