On the Regression Model for Generalized Normal Distributions

The traditional linear regression model that assumes normal residuals is applied extensively in engineering and science. However, the normality assumption of the model residuals is often ineffective. This drawback can be overcome by using a generalized normal regression model that assumes a non-normal response. In this paper, we propose regression models based on generalizations of the normal distribution. The proposed regression models can be used effectively in modeling data with a highly skewed response. Furthermore, we study in some details the structural properties of the proposed generalizations of the normal distribution. The maximum likelihood method is used for estimating the parameters of the proposed method. The performance of the maximum likelihood estimators in estimating the distributional parameters is assessed through a small simulation study. Applications to two real datasets are given to illustrate the flexibility and the usefulness of the proposed distributions and their regression models.


Introduction
Existing distributions do not always provide an adequate fit. Hence, generalizing distributions and studying their flexibility are of interest for researchers over recent decades. One of the earliest works on generating distributions was done by [1] who proposed a method of differential equation as a fundamental approach to generate statistical distributions. Ref. [2] also made a contribution in this category and developed another method based on differential equation. After that, other methods were developed such as the method of transformation [3] and the method of quantile function [4,5]. More recent techniques in generalizing statistical distributions emerged after the 1980s and can be summarized into five major categories [6]; the method of generating skew distributions, the method of adding parameters, the beta generated method, the transformed-transformer method, and the composite method.
The beta-generated (BG) family introduced by [7] has a cumulative distribution function (CDF) given by where b(t) is the probability density function (PDF) of the beta random variable and F(x) is the CDF of any random variable. The PDF corresponding to (1) is given by where Supp(F) is the support of F and B(α, β) = Γ(α)Γ(β) Γ(α+β) .
The beta-generated family of distributions is formed by using the beta distribution in (1) with support between 0 and 1 as a generator. Ref. [22], in turn, were interested whether other distributions with different support can be used as a generator. They extended the family of BG distributions and defined the so called T-X family. In the T-X family, the generator b(t) was replaced by a generator r T (t), where T is any random variable with support (a, b). The CDF of the T-X family is given by where W[0, 1] → is a link function that satisfies W(0) → a and W(1) → b. Ref. [23] studied a special case of the T-X family where the link function, W(.), is a quantile function of a random variable Y. The proposed CDF is defined as where T, R, and Y are random variables with CDF then the corresponding PDF of (4) is given by If R follows the normal distribution N(µ, σ 2 ), then (5) reduces to the T-normal family of distributions [24] with PDF given by where φ(.) and Φ(.) are the PDF and CDF of the standard normal distribution, respectively.
The T-normal family is a general base for generating many different generalizations of the normal distribution. The distributions generated from the T-normal family can be symmetric, skewed to right, skewed to the left, or bimodal. Some of the existing generalizations of normal distribution can be obtained using this framework. In particular, some generalizations of the normal distribution are beta-normal [7], Kumaraswamy normal [19] and gamma-normal distribution [25].
Other generalizations of the normal distribution is the skew-normal, first considered by [26], and it is defined as Another generalization of the normal distribution is the power-normal distribution [27] with CDF given by Several properties of the power-normal distribution are studied by [27]. Recently, Ref. [28] proposed a new extension of the normal distribution.
The rest of the paper is organized as follows. In Section 2, we introduce a class of skewsymmetric model by using the logistic kernel and the normal distribution as the baseline distribution. In Section 3, we discuss some structural properties of the logistic-normal (henceforth, LN in short) distribution including moments, tail behavior, and modes. In Section 4, the maximum likelihood estimation method is considered to estimate the model parameters, and a small simulation study is implemented to evaluate the performance of the method. In Section 5, a generalized normal regression model based on skew-LN distribution is developed. In Section 6, applications to two real datasets are given to demonstrate the flexibility and the usefulness of the new distribution and its regression model. We conclude this paper by providing some concluding remarks in Section 7.

The Symmetric Logistic-G Family of Distributions
If T follows the logistic distribution with PDF f T (x) = λe −λx (1 + e −λx ) −2 , λ > 0 and Y follows the standard logistic distribution (λ = 1), then Equation (4) reduces to the Logistic-G family of distributions with CDF given by where G(.) is the CDF of any baseline probability density function. A special case of (7) was studied in some details in [29]. The corresponding PDF of (7) is given by where g(.) is the PDF of G(.).

Remark 1.
The Logistic-G family possesses the following properties i. If g(x) in (8) is a symmetric PDF about µ, then the resulting f G (x) is a symmetric PDF about µ. i.e., the Logistic-G family in (7) preserves the symmetry property. ii. If a random variable T follows the logistic distribution with scale parameter λ, then the random variable X = G −1 e T 1+e T follows the Logistic-G family in (7). iii. The quantile function of the Logistic-G family can be written as Now setting G(x) to be the normal CDF with parameters µ and σ 2 , say G(x) = Φ x−µ σ , then the Logistic-G family reduces to the Logistic-normal distribution with CDF given by where λ > 0, σ > 0, and −∞ < µ < ∞. The associated PDF of (10) is when λ = 1, the logistic-normal (LN(µ, σ, λ), henceforth in short) in (10) reduces to the normal distribution. Thus LN distribution is a generalization of the normal distribution. Furthermore, the LN distribution is a member of the T-normal family proposed by [24]. In Figure 1, graphs of standard LN distribution (where µ = 0, σ = 1) for various values of λ are provided. Figure 1 shows that the logistic-normal PDF has several advantages, the parameter λ introduces the flexibility on kurtosis (see also Figure 2) and controls whether the distribution is unimodal or bimodal. Moreover, it appears that the bi-modality occurs when λ is approximately less than 0.5.

Some Properties of LN Distribution
We begin our discussion by providing some useful remarks as listed below. (10), (11) and Remark 1, the following useful properties can be obtained

Remark 2. Using
is symmetric about the location parameter µ. (ii) The mean and median of the LN distribution are µ which is the location parameter of the normal distribution. (iii) The quantile function of the LN distribution can be written as (iv) In order to generate random sample from the LN distribution, first simulate random sample, t i , i = 1, 2, · · · , n, from logistic(λ) distribution and then compute x i = µ + σΦ −1 e t i 1+e t i .

Remark 3.
Using the fact that φ (x) = −xφ(x) and setting the derivative of log f N (x) in (11) to 0, one can show that Mode(s) of the LN distribution is/are at the point(s) x * = µ + σz * , where z * satisfies the equation From Remark 3, it is easy to see that 0 satisfies Equation (12). Therefore f N (x) has a critical point at x = µ. We were able to observe numerically that for λ > 0.5 the distribution is always unimodal and hence, x = µ is the unique mode in this case. In addition, because of the fact that LN distribution is symmetric about µ for all values of λ, then for the bimodal case, if x = a < µ is a mode then the second mode will be at x = 2µ − a.
The tail behaviour of the standard LN distribution (µ = 0 and σ = 1) as x → ±∞ are discussed in the following Lemma. [17]). Consequently, Lemma 1 implies that as Z → ±∞, the tails of the standard LN distribution behave in similar way as the right tail of the function Note that when 0 < λ < 1, the tails of f N (x) approaches 0 slowly, while for λ > 1, the tails of f N (x) approaches 0 faster, meaning that the tail weight increases for higher values of λ. A graphical representation of the association between the tail weight of LN and λ can be shown using the measure of Kurtosis defined by [30]. The Moore's kurtosis is defined as The values of Moore's kurtosis of LN(0, 1, λ) for various value of λ is depicted in Figure 2. It shows that as λ increases the Moore's kurtosis increases. For 0 < λ < 1, there is a sharp change in the kurtosis, while for λ > 1 the change is gradual. Figure 1 indicates that for λ < 1, the tails of LN distribution are lighter than that of the normal distribution, while for λ > 1 the tails of LN distribution are heavier than that of the normal distribution.

Moments of LN Distribution
Using Remark 2 (ii), the rth moment of the LN distribution can be written as where the random variable T follows the logistic distribution with scale parameter λ. Therefore, . ξ j can be evaluated using numerical integration from any available software such as R or SAS.

Estimation and Simulation
In this section, the maximum likelihood method (MLE) is used to estimate the parameters of LN distribution. Moreover, a small simulation study is performed to assess the performance of the MLE method.

Parameter Estimation of LN Distribution
Let x 1 , x 2 , · · · , x n be a random sample of size n taken from LN distribution. Then the log-likelihood function is given by The MLE of λ, µ, and σ of the parameters λ, µ, and σ can be obtained by maximizing numerically the log-likelihood function in (14). The initial value of µ is taken to be the moment estimatorx. The initial value of σ is taken to be the sample standard deviation, s.
To obtain the initial value of the parameter λ, we use Remark 2 (iv) as follows; assume the , i = 1, 2, ..., n is taken from the logistic distribution with parameter λ. By equating the population variance π 2 3λ 2 of logistic distribution with the sample variance, s 2 T of the random sample t i and solving it for λ, we obtain λ 0 = 1 3 π s T . The trust-region optimization routine in SAS (PROC IML and CALL NLPTR) is used in order to maximize the likelihood function in (14). The trust-region optimization routine is a powerful technique that can optimize complicated functions. It outputs the iteration details including parameter estimates, their standard errors, and the value of the gradient function at which iteration stops.

Simulation
In order to evaluate the performance of the ML method, a small simulation study is conducted with sample sizes n = 30, 50, 70 and with three different parameter combinations. The study involved computing and analyzing the relative bias [(Estimate-Actual)/Actual] and the standard deviation of the estimates. The results of the study are reported in Table 1. From Table 1, it is observed that the ML estimate of the parameter µ is overestimated. Moreover, when λ < 1, the ML estimates of λ and σ are overestimated. On the other hand, when λ > 1, ML estimates of λ and σ are underestimated. Moreover, for small sample size(s) and when λ < 1, MLE method does not perform well. In fact, standard deviations are higher than the corresponding estimated values. However, the results for higher sample sizes and when λ > 1, it can be seen that the MLE method performs quite well in estimating the model parameters.

Skew-LN and Its Generalized Normal Regression Model
In this section, we first propose a skewed type of LN distribution that can be used to fit skewed dataset. In Section 5.2, we propose a location-scale regression model based on the skew-LN distribution.

Skew Logistic-Normal Distribution
For skewed data, one can generate a skew-LN distribution in various ways. Once way is by exponentiating the CDF of the LN distribution as Note that when α = 1, the skew-LN distribution in (15) reduces to LN distribution. Moreover, when λ = 1, the skew-LN reduces to the eponentiated-normal distribution proposed by [27]. Finally, when α = λ = 1, the skew-LN distribution reduces to normal distribution.
In order to analyze the skewness and kurtosis regions of the skew-LN distribution, the Refs. [30,31] measures were plotted against the parameter α and λ. Figure 3 shows that the distribution is right skewed for α, λ < 1 and left skewed for α > 1, λ < 1 and α < 1, λ > 1. The plot of kurtosis in Figure 3 demonstrates the flexibility of the proposed distribution. For λ < 1, the tails of the skewed LN can be heavier or lighter than that tail of the normal distribution. The skew-LN distribution has several advantages; the parameter α introduces the flexibility on the skewness and the parameter λ introduces the flexibility on the kurtosis. Furthermore, the main advantage of the skew-LN when compared with Azzalini skewnormal is the flexibility of fitting data with wider range of skewness and kurtosis. Based on numerical calculations, for the Azzalini skew-normal, the Galton's skewness ranges between −0.1443 and 0.1443 and the Moor's kurtosis ranges between 1.1746 and 1.2460. However, for the skew-LN, the Galton's skewness ranges between −0.3000 and 0.3000 and the Moor's kurtosis ranges between 0.8000 and 1.6000. It is also worth mentioning that the skew-LN can be unimodal or bimodal and has closed form CDF which is not the case of Azzalini skew-normal distribution.

Generalized Normal Regression Model Based on Skew-LN Distribution
The traditional linear regression model that assumes normal residuals is applied extensively in engineering and science. However, the normality assumption of the model residuals is often ineffective. This drawback can be overcome by using a generalized normal regression model that assumes non-normal response Y. In this section, T is assumed to follow the skew-LN distribution. The following location-scale regression model is considered based on the skew-LN distribution where y i pertains to the response variable with a skew-LN distribution in (15), β = β 0 , β 1 , · · · , β p T , and σ > 0 are unknown parameters. Every y i has a covariate vector x T i = 1, x i1 , · · · , x ip that models the linear predictor µ i = x T i β. The random error Z i follows the skew -LN (0, 1, λ, α) distribution.

Remark 5.
The skew-LN regression model in (16) has several nested regression models. These special cases are enumerated as follows: 1.
The regression model in (16) is reduced to the traditional normal linear regression model when α = λ = 1.

2.
The exponentiated-normal (Exp-N) regression model is obtained when λ = 1. This locationscale regression model is based on the power normal distribution introduced by [27].

3.
The LN regression model based on the distribution (10) is obtained when α = 1.
A sample of (y 1 , x 1 ), · · · , (y n , x n ) of n independent observations is considered, and the log-likelihood function for model (16) parameters θ = λ, α, σ, β T T is presented as where z i = The maximum likelihood θ of the parameter vector θ can be obtained by maximizing the log-likelihood function in (17) numerically.

Applications
In this section, we apply the LN distribution and the generalized normal regression to two real-life datasets. The first dataset possesses a bimodal shape, and the fit of the LN distribution is compared with the mixture normal distribution. For the second application, the skew-normal regression model is compared with some nested sub-models and some other generalization of the normal regression models. Maximum likelihood method is used to estimate the model parameters.

Fitting LN Distribution to Buoys Data
In this subsection the LN distribution is fitted to a bimodal datasets using ML method. The dataset is obtained from National Data Buoy Center (NDBC). It represents the number of buoys situated in the North East Pacific: Buoy 46,005 (46 N, 131 W) for the time period 1 January 1983 to 31 December 2003. The data is available from [1]. The Histogram in Figure 4 shows that the distribution of the data possesses a bimodality shape, for this reason, we fitted the dataset to both LN and the mixture normal distributions. The results of the maximum likelihood estimates, the log-likelihood value, the AIC (Akaike Information Criterion) and the Kolmogorov-Smirnov (K-S) test statistic for the fitted distributions are reported in Table 2. Figure 4 displays both the empirical and the fitted cumulative distribution as well as the probability density functions for the fitted distributions. The results in Table 2 indicate that the LN distribution outperforms the mixture normal distribution. In fact, the fitted CDF in Figure 4 shows that the mixture normal distribution does not provide an adequate fit. The fact that the LN distribution has only three parameters adds an extra advantage to the distribution over the mixture normal distribution.

Modeling Real Estate Valuation Using the Generalized Normal Regression Model
The dataset contains historical data on the real estate market from June 2012 to May 2013. The data is obtained from Sindian District in New Taipei City, Taiwan (for additional details, see [32]). The data consist of n = 414 transaction records of real estate property. The data can be used to establish the relationship between housing price (per unit area) and its predictive regressors. The following variables are used (for i = 1, 2, · · · , 414). Response variable y is the housing price per unit area (10, 000 New Taiwan Dollar/Ping, where 1 Ping = 3.3 m 2 ), the covariates are as follows: x i1 is the transaction date (e.g., 2013.250 = 2013 March and 2013.500 = 2013 June), x i2 is the house age (in years), x i3 is the distance to the nearest MRT station (in meters), x i4 is the number of convenience stores in the living circle on foot (integer), and x i5 is the geographic coordinate, latitude (in degrees). The data are analyzed on the basis of the following skew-LN regression model . . , 414, where the error terms Z i are independent random variables that assumed to follow the skew-LN(0, 1, λ, α) distribution, and x * ij = (x ij −x j )/s j , j = 1, 2, ..., 5, are the standardized covariates, which are considered because of the fact that some covariates are measured using different scales. Additionally, the fit under the skew-LN regression model is compared with several regression models, including the regression model based on the beta-normal (BN) distribution [7], the regression model based on the skewed-normal (SN) distribution [26], and the extended normal (EN) regression model [28]. Furthermore, the skew-LN regression model is compared with its nested models, including LN, Exp-N, and normal regression. In this application, the model parameters are estimated using the maximum likelihood method and SAS programming language is used. The initial values of β 0 , · · · , β 5 and σ are obtained from fitting the data to the normal regression model. The initial values of the other parameters are set to 1. Table 3 shows the MLEs results of fitting skew-LN, LN, Exp-N, SN, EN, and normal regression models to the data.   The fitted skew-LN an LN regression models show that the estimates β 0 , · · · , β 5 and σ are significant at 5% level of error. Table 4 presents the goodness of fit statistics including AIC, consistent AIC (AICC) and Bayesian information criterion (BIC). The goodness of fit statistics show that the skew-LN regression model outperforms the other regression models. We also notice that the LN regression model has the second-lowest values of AIC, AICC, and BIC. Hence, skew-LN and LN regression models can be used effectively to analyze the real estate valuation data. The likelihood ratio (LR) statistic is utilized to compare the skew-LN regression model with its sub-models; normal, LN, and Exp-N regression models. The LR test statistic values and the corresponding p-values are given in Table 5. This Table shows that the skew-LN regression model has a better fit when compared with the other sub-models. The LN regression model also has a better fit when compared with the normal regression model.

Concluding Remarks
In this paper, two generalizations of the normal distribution namely; logistic-normal and skew logistic-normal distributions were investigated. Several mathematical and structural properties have been studied such as shape properties. The proposed generalizations of the normal distribution exhibit a great flexibility in modeling symmetric as well as skewed datasets. Moreover, new regression models based on both logistic-normal and skew logistic-normal were developed. Two real datasets were used to illustrate the applicability of the distributions and their regression models.
Future work could be devoted toward investigating other parameter estimation methods for the LN and the skew-LN distributions. The applicability of the skew-LN regression model to other fields could be further explored.