5. 11
USAGE OF
REGRESSION IN
SOCIOLOGY & SOCIAL
SCIENCES
13
COMPARISON
BETWEEN
COEFFICIENT &
REGRESSION
PROS & CONS OF
REGRESSION
12
14
PRECAUTIONS IN
REGRESSION
15
REGRESSIONAL
FORMULA, MATHS &
GRAPHS
Table of contents
7. 01. Introduction &
Definition
British Biometrician
Sir Francis Galton
was the individual who first used
the term Regression
in the 19th century.
Regression is a statistical method employed to investigate and model the
relationship between a dependent variable and one or more independent
variables. It plays a crucial role in both predictive modeling and understanding the
underlying factors influencing a particular phenomenon. (Fox, J., & Weisberg, S.
(2019). An R Companion to Applied Regression. Sage Publications.)
8. Regression shows us how to determine both the nature and strength of a nelationship
between two variables. (Levin, R., & Rubin,D. (2017). Statistics for Management.
Pearson Publications.)
Regression is a statistical method used in finance, investing, and other disciplines that
attempts to determine the strength and character of the relationship between one
dependent variable (usually denoted by Y) and a series of other variables (known as
independent variables). (Beers,B. (2023))
9. Regression is a statistical technique that serves a basis for studying the
dependence of one variable, called dependent variable, on one or more other
variables, called explanatory variables.(Islam,N.(2021). An Introduction to
Statistics and Probability. Mullick & Brothers Publications.)
Regression shows the relationship between continuous variables. (Pearsons
and Spearman (2017) Sage Publications)
Regression analysis is set of statistical techniques that allows one to assess
the relationship between one dependent variable and several independent
variable. (Tabachnick,B., Fidell,L. (2016) Using Multiverse Statistics)
10. There are basically two types of regression. These are-
i. Linear regression
ii. Multiple regression
Linear regression:
It is used when there is only one independent variable to
predict the outcome of y.
Linear regression was the first type of regression analysis
to be studied rigorously,
and to be used extensively in practical applications.
02. Types of Regression
11. Multiple regression:
It is used when there are two or more independent variables to predict the outcome
of y. These independent variables serve as predictor variables,
while the single dependent variable serves as the criterion variable.
12. In terms of relationship, there are two types . These are-
i. Deterministic relationship
ii. Statistical relationship
(i)Deterministic relationship: The equation exactly describes the
relationship between two variables. Here is an example of a
deterministic relationship.
The observed (x, y) data points fall directly on a line. As you may
remember, the relationship between degrees Fahrenheit and
degrees Celsius is known to be:
F=95C+32
We can use this equation to determine the temperature in degrees
Fahrenheit exactly, if know the temperature in degrees celsius.
03. Types of
Relationship
13. (ii) Statistical relationship: The relationship between the variables is not perfect.
Here is an example of a statistical relationship.
The response variable y is the mortality due to skin cancer (number of deaths per 10
million people) and the predictor variable x is the latitude (degrees North) at the
center of each of 49 states in the U.S.
The scatter plot suggests that living in higher latitudes in
the northern U.S. may reduce the risk of death from skin
cancer due to less exposure to harmful sun rays.
While a noticeable negative linear trend exists between
latitude and skin cancer mortality,
it's not a perfect correlation. The plot shows both a trend
and some variability, indicating that
it's a statistical relationship, not a deterministic one.
14. Some other examples of statistical relationship might include:
Height and weight - as height increases, you'd expect weight to increase
Alcohol consumption and blood pressure
Driving speed and gas mileage as driving. speed increases, you'd expect gas
mileage to decrease.
All these variables can be increased or decreased but not perfectly unlike the
deterministic relationship
15. • Predicting the value of a dependent
variable.
• Explaining the impact of changes in an
independent variable.
• Hypothesis testing.
• Controlling for confounding variables.
• Model comparison.
• Time series analysis.
• Forecasting.
04. Usage of Regression Analysis
In following ways, regression analysis is used-
• Risk assessment.
• Quality control.
• Market research.
• Healthcare and medicine.
• Environmental studies.
• Sports analytics.
• Social sciences.
16. Mostly there are two of variable. These are-
•Dependent variable: The variable we wish to predict or explain. The change that
happens because of the independent variable.
Example: The height or health of the plant.
05. Dependent & Independent variable
Variable
Dependent
Variable
Independent
Variable
17. • Independent variable: The variable used to predict or explain the dependent
variable. It’s the one thing you change. Limit to only one in an experiment.
Example: The liquid used to water each plant.
18. 06. Assumptions in Regression Analysis
• Data must be parametric.
• There is no outliners in the data.
• Variables normally distributed (if not tried log Square root, square and
inverse transformation).
The accuracy and validity of regression results depend on several
underlying assumptions. These assumptions are essential because they
provide a foundation for the statistical methods used in regression
analysis. Violations of these assumptions can lead to biased or
unreliable results, affecting the interpretation and generalizability of
findings.
Following are some assumptions-
19. • The regression model is linear in nature.
• The errors are independent (no autocorrelation).
• The error terms are normally distributed.
• There is no multicollinearity.
• The error has a constant variance (assumption of homoscedasticity).
• Homogeneity of variance.
• Independence of observations
• The relationship between the independent and dependent variable is linear.
20. 07. Characteristics of Simple Linear
Regression Model
• Only one independent variable "X".
• The relationship between "X" and "Y" is described by a linear
function.
• Changes in "Y" are assumed to be related to changes in "X".
• It provides estimates of values of the dependent variables from
values of independent variables.
• It’s goal is to obtain a measure of the error involved in using the
regression line as a basis for estimation.
• Through it we can obtain a measure of the degree of association or
correlation that exists between two variables
Following are the characteristics of simple linear regression model
21. The simple linear regression equation provides an "Estimate" of the
population regression line.
The equation:
Ŷ= a+b*x
Here,
Ŷ= Estimated "y" value for Observation 1.
a= Estimate of regression intercept
b= Estimate of regression slope
x= Value of "x” for observation 1.
08. Simple Linear Regression equation
22. 09. Slope & Intersect of Regression line
The regression intercept indicates the location where the regression line intersects
an axis.
It is usually denoted by "a".
The regression slope, usually by defined "b", determines the steepness of a line of
regression.
23. By examining the regression line the we can easily determine the type of regression
slope:-
• Positive correlation
• Negative correlation
• No Correlation
Here, the slope is
positive.
Y increases with
the increase in X.
Here, the slope
co-negative.
y decreases when
X increases.
Here, the slope is
0.Neither Y
increases or
decrease when x
increases
10. Examining Regression line and
Determining Slope
24. Coefficient Regression
• It is a statistical
method which
determines
association of
two variables.
• It explains how an
indeoendent
variable is
numerically related
to the dependent
variable.
2. It is used to
represent linear
relationship
between two
variables.
2. It is used to fit a best
line and estimate one
variable on the basis of
another variable.
11. Comparison between Coefficient and
Regression
3. It indicates the
extent to which
two variables
move together
3. It indicates the
impact of a unit
change in the known
variable(x) on the
extimated variable
(y)
4. The main goal is
to find a numerical
value expressing
the relationship
between variables
4. The main goal is
to estimate values
of random variables
on the values of
fixed variable
Coefficient Regression
25. Regression
5..The prominent types of regression
models are - Linear regression, Polynomial
Regression, Random Forecast Regression
etc.
6. The regression coefficient is positive for
every unit increase in x, the corresponding
average increase in y is by x and vice-
versa.
7. One can’t randomly assign x as
independent and y as dependent variable.
8. It is not symmetric in x and y.
Coefficient
5. The most popular measures of co-
efficient of corelation are- Pearson’s R,
Spearman’s Rho, Least Square Method.
6.. If the linear correlation is
positive/negative than two variables are
positively/negatively correlated.
7. One of the variables can be taken as x
and other can be taken as variable y.
8. It is symmetric an x and y
26. 12. Usage of Regression in Sociology & Social
Sciences
Regression is the most popular and widely used statistical
methodology for analyzing empirical problems in different field's of
social sciences. For eg: Sociology, economics, political science
and so on. This is used in social sciences, because:-
• Moot-real world problems or situations can be modelled and explained.
• It helps us to determine the relationship between the dependent & the
independent variable.
• We can estimate the value of dependent variable, if the value of independent
variable is given.
• We also use it to forecast,plan and design certain event on phenomena.
27. 13. Pros & Cons of Regression
Like any statistical method, regression analysis comes with its own set of
advantages and limitations. In this discussion, we will explore the pros and cons
of regression to provide a comprehensive understanding of when and how it
should be used in data analysis and decision-making processes.
The pros are-
• Regression models are based on statistical principles like least squares and
correlation, making them easy to understand.
• They are expressed as algebraic equations, making predictions
straightforward.
• The model's strength or goodness of fit is assessed using well-understood
statistical parameters, including correlation coefficients.
28. • Regression models can often match or outperform other predictive models.
• They allow the inclusion of multiple variables as needed.
• Regression tools are widely available in data mining packages and software like MS
Excel.
• Unlike classification models, regression models estimate numerical values rather than
categorizing observations.
• Regression is highly significant in problems involving continuous numerical outcomes.
• These models can assess the impact of multiple variables on a dependent variable's
strength.
• Regression analysis measures the degree of association or correlation between two
variables, aiding in estimation and understanding complex relationships.
29. • Regression models cannot work properly if the inputted data has
error
• Regression model is sensitive to collinear problems. if the
Independence variables are strongly correlated they will eat into
each others predictive power and regression coefficients will lose
their rigidness.
• As the number of variables increases the reliability of the
regression mode of the decreases. The regression models work
better if you have a small number of variables
• Regression models do not automatically take care of nonlinearity.
The user needs to imagine the kind of additionals that might be
needed to be added to the regression model to improve its fit.
Now let us have a look at the cons-
30. • Regression models provide estimations that one dependents on the experts’
experience and intuition that sometimes are questionable
• It may not provide a consistent estimation
• It is unable to handle missing and non-quantitative data. Quality of estimate relies on
quality of historical data
• Data analysis can be complex
• It is sensitive to outliners
• Regression models work in data sets containing numeric values and not with
categorical variables
31. to ensure the accuracy and reliability of regression results, it is essential
to take certain precautions. These precautions are crucial in mitigating
potential pitfalls and ensuring that the regression analysis provides
meaningful insights.
The precautious are-
• Regression focuses on association and not on causation.
• The independent variable must precede the dependent variable on
time.
• The dependent and the independent variable must be plausibly lined
by a theory.
14. precautions in Regression
32. Computational formula:
• Ŷ= a+b*x
Where,
B= N∑XY-∑X∑Y / N∑X^2-(∑X)^2
A= Ȳ-BX
̄
= ∑Y-B∑X / N
• X
̂ = A+BY
Where,
B= N∑XY-∑X∑Y / N∑X^2-(∑X)^2
A= X
̄ -BȲ
= ∑X-B∑Y / N
15. Regression formula math & Graph
33. Correlational formula:
• X
̂ = A+BY
Where,
A= X
̄ - r. Sx / Sy . Ȳ
B= r. Sx / Sy
• Ŷ= a+bx
Where,
A= Ŷ- r. Sy / Sx. X
̄
B= r. Sy / Sx
34. Question no. 01:
An ice-cream company listed the number of ice-cream sold on 8 days with different
temperatures in an outlet.
Draw a regression line for the entire data with graphic representation
Solution:
There are two variables in it, those are-
Temperature (x) and number of ice-cream sold (y).
Here, the number of ice-cream sold(y) depends on temperature (x).
So, the equation is,
37. So, the equation with the value of a,b is-
Ŷ= 3.59+6.29*X
Now, we have to find different values of y for the different values of
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
(X,Y)= (35,245.7), (22,155.8), (29,204.27), (19,135) , (30,211.1), (24,169.6) ,
(33,231.9) , (17,121.2)
38. Question no. 02:
X and Y are two variables, where y depends on x. Draw a regression line
Solution:
Y depends on X. So, X and Y are two variables.
So, the equation is,
Ŷ= a+b*x
Here,
a=(∑y-b∑x)/N
and
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
39. Now to find out the values of the formula, we need to construct a table like below:
a=(∑y-b∑x)/N =(216-12*34)/6
= -32
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
=((6*1504)-(34*216)) / ((6*216)-(34)^2)
= 12
40. So, the equation with the value of a,b is-
Ŷ= -32+12*x
Now, we have to find different values of y for the different values of x
; ∗
; ∗
; ∗
; ∗
; ∗
; ∗
(x,y)= (5,28), (3,4), (4,16), (7,52) , (9,76), (6,40)
41. Question no. 03:
Earned profit of a cold drinks shop and the temperature of six different days are given
From the info given above-
a) Determine regression equation
b) Draw regression line
c) Determine the E.P when the temperature is 25 C and 36 C
42. Solution:
There are two samples, X and Y.
Here,
X= temperature
Y= Earned profit ; and this depends on X
So, the equation is,
Ŷ= a+b*x
Here,
a=(∑y-b∑x)/N
and
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
43. Now to find out the values of the formula, we need to construct a table like below:
a=(∑y-b∑x)/N =(183.5-1.32*115)/6
= 5.28
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
=((6*6011.5)-(115*183.5)) / ((6*4091)-(115)^2)
= 1.32
44. So, the equation with the value of a,b is-
Ŷ= 5.28+1.32*x
Now, we have to find different values of y for the different values of x
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
(x,y)= (32,47.52), (47,67.32), (22,34.32), (19,30.36) , (-2,2.64), (-3,1.32)
45. The E.P when temperature is 25C,
y_(25=5.28+1.32*25=38.28 )
The E.P when temperature is 36C,
y_(36=5.28+1.32*36=52.8 )
46. Question no. 04:
The following values show that political trust is significantly correlated with public
satisfaction ( in %).
Calculate the regression equation. What will be the percentage of public satisfaction
when the percentage of political trust is 80?
47. Answer:
There are two samples, X and Y.
Here,
X= Political trust
Y= Public satisfaction ; and this depends on X
So, the equation is,
Ŷ= a+b*x
Here,
a=(∑y-b∑x)/N
and
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
48. Now to find out the values of the formula, we need to construct a table like below:
a=(∑y-b∑x)/N =(335-0.94*477)/7
= -16.197
b=(N∑xy-∑x∑y) / (N∑x^2-(∑x)^2)
=((7*23411)-(477*335)) / ((7*33127)-(477)^2)
= 0.94
49. So, the equation with the value of a,b is-
Ŷ= -16.19+0.94*x
For the value of 80,
Ŷ = -16.19+0.94*80
= -16+0.94*80
= 59.01
Now, we have to find different values of y for the different values of x
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
; . . ∗ .
(x,y)= (70,49.61), (75,54.31), (85,63.71), (60,40.21) ,
(52,32.69), (65,44.91), (72,51.49)
50. Question no. 05:
The following values relate to the investment in a business and profit from it. Calculate
the regression equation and draw it into the graph.
Answer:
There are two samples, X and Y.
Here,
X= Amount of money
Y= Profit ; and this depends on X
So, the equation is,
Ŷ= a+b*x
53. Now, we have to find different values of y for the different values of x
x_(1=4; ) y_(1=0.566+(2.9*4)=12.166 )
x_(2=6; ) y_(2=0.566+(2.9*6)=17.966 )
x_(3=8; ) y_(3=0.566+(2.9*8)=23.766)
x_(4=10; ) y_(4=0.566+(2.9*10)=29.566)
x_(5=12; ) y_(5=0.566+(2.9*12)=35.336)
x_(6=14; ) y_(6=0.566+(2.9*14)=41.166)
(x,y)= (4,12.166), (6,17.966), (8,23.766), (10,29.566) , (12,35.336), (14,41.166)
54. Sample problem 1:
Time of watching TV and percentages of vison compentency is given. Draw a
regression line for & hours of watching TV.
(x,y) = (1,89), (2,72), (4,60), (2,88), (5,57), (6,59)
Sample problem 2:
Time of workers’ working(in hour)and percentages of their physical toilment is given.
Draw a regression line.
(x,y) = (1,22), (2,31), (3,38), (4,45), (5,49), (6,52)
55. Sample problem 3:
Here x and y are two random variable
(x,y) = (10,4), (15,8), (20,12), (25,16), (30,20), (35,24)
Now, determine both of the equation and draw it into the graph
Sample problem 4:
Education rates and crime rates of 7 different societies are given below. Draw a
regression lineand determine regression equation.
(x,y) = (43,28), (65,19), (30,76), (76,20), (22,83), (58,26), (97,4)
56. Sample problem 3:
Here x and y are two random variable
(x,y) = (10,4), (15,8), (20,12), (25,16), (30,20), (35,24)
Now, determine both of the equation and draw it into the graph
Sample problem 4:
Education rates and crime rates of 7 different societies are given below. Draw a
regression lineand determine regression equation.
(x,y) = (43,28), (65,19), (30,76), (76,20), (22,83), (58,26), (97,4)
57. Reference
• (Fox, J., & Weisberg, S. (2019). An R Companion to Applied Regression. Sage
Publications.)
2.(Levin, R., & Rubin,D. (2017). Statistics for Management. Pearson Publications.)
3.(Islam,N.(2021). An Introduction to Statistics and Probability. Mullick & Brothers
Publications.)
4.(Pearsons and Spearman (2017) Sage Publications)
5. (Tabachnick,B., Fidell,L. (2016) Using Multiverse Statistics)
6. Khan Academy. (n.d.). Multiple regression. Retrieved 2021, from
https://www.khanacademy.org/math/statistics-probability/multiple-regression
7.Smith, J. A., & Johnson, R. B. (2018). Understanding the Impact of Multiple
Independent Variables in Regression Analysis. Journal of Applied Statistics