BINARY LOGISTIC REGRESSION IN DETERMINING AFFECTING FACTORS STUDENT GRADUATION IN A SUBJECT

Received 03 June, 2021 Revised 09 June, 2021 Accepted 09 June, 2021 Good communication and coordination between lecturers are needed in delivering material by different lecturers to ensure the relatively uniform quality of education. Knowing the success information from several classes to predict other classes, should be completed by significant parameters used in the algorithm. This research is using a quantitative analysis method with binary logistic regression methodology in determining critical factors of train data on “Introduction to Information Technology” subject in the university of XYZ. Several statistical testing are conducted to give the expected results using software excel with Real Statistics add-ins and Orange Data Mining in testing the pass-prediction from the given data training. The successive model can also be used to classify graduation for the different subjects.


Introduction
The affordable cost of education at university is not factor for university is not qualified. However, the quality of educators constantly and continuously in improving knowledge is the factor to determine the quality of university, such as improving teaching discipline, conducting continuous research, providing good educational services and providing organized scholarship programs, etc. Increasing the student discipline in attendance, doing assignments and being active in class also become supporting factor for quality of university.
The research was conducted at XYZ University, Faculty of Technology and Computer Science, Department of Informatics Engineering in the Introduction to Information Technology (Pengantar Teknologi Informasi) Subject. This department has about 50 classes per semester and 11 lecturers for one subject. This condition makes good communication and coordination between lecturers needed in giving material from particular subject by several lecturers to ensure uniformity in the quality of education.
This research conducted several statistical tests to answer the questions about the factors of affect student graduation in a subject by using data mining software to obtain graduation predictions from testing data. Meanwhile, the purpose of this research to obtain model to predict or to classify graduation in other subject

Research Method
This research used binary logistic regression methodology as problem solving algorithm and quantitative analysis method [1] in completing this research. As part of simply statistical model towards complex and messy data [2], binary logistic regression is used to model binary variable (0, 1) based on one or more variables named predictor [2]. 2]. While quantitative analysis is research based on numbers or quantity calculations for all phenomena related to numeric [1]. So it can be concluded that this research was conducted by analyzing the numbers based on certain formulas in determining the effect factors for student graduation.
The total population was 175 students of Information Technology major in pengantar teknologi informasi's subject. The input parameters taken are assignments (HomeWork), midterm exams (MidTest) and final exams (FinalTest). Information on the values obtained are categorized into: 1. 0 -<56, represented by binary value = 0 2. >=56, represented by binary value = 1 Calculations are carried out using "real statistics" tools in add-ins in the excel application, some test values can be accumulated automatically. The steps taken are [3], [ Logistic regression is one of the most popular estimation methods, because it gives an estimation range between 0 and 1 [5]. The following is the logistic regression formula along with other formulas:

Logistics Regression
It is a statistical analysis method to describe the relationship between two or more dependent variables (pass = 1 and fail = 0) with independent variables (HomeWork, MidTest and FinalTest) [3] with:

Pearson Chi Square
It is a type of non-parametric comparative test on two variables of the nominal data scale of two variables [4],

Wald Test
It is used to test the presence or none of the influence of the independent variable to the dependent variable partially by comparing the Wald statistical value with the comparison value of Chi Square [3], [8]. The hypothesis used is as follows [6]:

Condition
H0 : βi = 0 (There is no significant effect between independent and dependent variables) H1 : βi ≠ 0 ((There is a significant effect between the independent and dependent variables)

Significance
The level of significance (α) = 5% is the level of confidence in the statistics of 95%, which is obtained from 100 -95 = 5% [9]

Simultaneous Test
It also known as Deviance Test, it is carried out to test whether the resulting model based on multivariate/simultaneous logistic regression is feasible or mutual accord [4]. The hypothesis is as follows:

Condition
H0 : Model accord to data H1 : Model doesn't accord to data

Odds Ratio
It is compilation of divisible odds by other odds [3],

Result and Discussion
By using the "real statistics" tools in excel application software, it was obtained the classification of pass/fail as a result of the logit transformation as shown in table 1 below.

. Partial Test Result
p-value is the result of the chi-square calculation of the value of Wald test. In the "Homework" row, it appears that Wald Test value is < from p-value, while the other rows show Wald Test > from p-value. It means that the value of homework variable (X1) is important or decisive factor for student graduation from the training data in this study.
Furthermore, the Simultaneous Test is calculated automatically by real-statistics from excel, itcan be seen in Table 4 below. The p-value is the result of the chi-square calculation of all the data obtained from the calculations on the Chi-Sq row with df, so the p-value has the value of the significance coefficient. If the p-value < alpha value, it is said that the model match the data. To strengthen the hypothesis, it can be seen in the results of the Hosmer test in table 5 below. The opposite from the Simultaneous Test, in Hosmer Test if p-value > alpha value (or Hosmer value < p-value) the significance is good or the model match the data. From the two tests, it appears that the model made match with given train data. Then, from the results of Wald Test, the EXP (β) value wasused to interpret the Odds Ratio value. These values can be seen in table 6 below. Table 6. Odds Ratio From table 6, it can be seen the value of the odds ratio for all variables. In Homework variable,it was obtained large value and largest among the 3 variables, it is 3.91 x 1010. It means that for students always do assignments or have good grades, the probability of passing is 3,91 x 10 10 times than the students do not do assignments or have less grades. This large value is possible because there are no students who pass if the assignment score is bad or equal to 0 (zero). Homework variable ranks first for the chances of passing above. The odds ratio value for the Mid-test variable is 15.01, it means that the chance of students was taking the mid-test and getting a good score is 15.01 times than the students was taking the exam but get a bad score. While the odds ratio for Final Test variable is 21.03. It means that the probability of students who pass the exam and get a good score is 21.03 times than the students was taking the final test but get a bad score. Final Test odds rank second from three passing variables above.
From the existing train data, a passing test was carried out by using Orange Data Mining. This software is powerfull free data mining software. Various reviews about this software show high value than the other free software. Figure 1   From the results of testing by using Orange Data Mining, the graduation limit value is obtained, there are: the failed score at 45 and the pass score at 46. This relatively low passing score is very dependent on the train data being used basic processing. The train data in this research provides fairly large tolerance for student graduation. The logistic regression methodology provides fairly high level of precision, it is 0.892.

Conclusion
The results of this research indicate high accuracy value can be obtained by using logistic regression methodology in predicting student graduation. It shown from evaluation of tests used Orange Data Mining. The training data has high tolerance for student graduation, it provides limitation for low pass value from the results of testing data set. The logistic regression formula is obtained through calculations being application so the coefficients of several independent variables, such as: homework, mid-test and final-test. Then tested the formulation by conducting Simultaneous and Hosmer Test, as the result is obtained the model match the data. And then conducted the Wald Test, it result the homework variable is very significant factor in student graduation. With good homework score provides large opportunity for graduation. It also shown in calculation e-ISSN: 2622-1659 Jurnal Teknologi dan Open Source, Vol. 4, No. 1, June 2021: 114 -120 120 of Odds Ratio that large opportunity indicated by large odds value for the homework variable. Then the large odds value is followed by the final test variable and the last is the midtest variable.
There are many machine learning methodologies can be used to make various predictions of the problem. One of them have highest precision in these predictions, it is logistic regression. It is interesting to do comparison between these methodologies and logistic regression, to get results to help many parties in making decisions. For example in predicting student graduation as in this research. The research compares these methodologies is recommended to find out the highest precision, the possibility of knowing the factors affect the results, the difficulties and convenience experienced during making predictions from each of these methodologies, and so on. And the results of this research can be used as comparative information for further research.