Assignment: Data Management and Regression AnalysisConceptAn important way to test the relationship...

Question

Assignment: Data Management and Regression Analysis

Concept

An important way to test the relationship between two variables
Y
and
X
is to run the model:
Y = a + bX
using ordinary least squares (OLS) from statsmodels.formula.api package. When we run regressions, we not only estimate the parameters
a
and
b
that can then be used for predictions, we also get to understand how well the model fits (i.e., how much of the variance in
Y
is explained by
a+bX).

There are many critical issues such as the selection and measurement of the
X
and
Y
variables. For example, are the variables scaled properly? How to select
X
variables?

Common sense and business knowledge can often guide you in the proper direction, but one also has to smartly use exploratory data analysis (EDA).
In this assignment, you will apply such analyses to understand the relationship between audit fees (the Y variable) and financial characteristics of a firm (the X variables).

You are provided with data from two separate sources: Assignment 4 OL AuditFees201019 contains audit fee information from the Audit Analytics database Assignment 4 OL Compustat201019 contains financial characteristics of firms from the Compustat Annual Industrial file.

You can find some of the variables defined in the Compustat database.

Requirements You are expected to conduct library research (search www.scholar.google.com using keywords such as audit fees) to gain an understanding of variables affecting audit fees. The main requirement is that you identify and demonstrate a model explaining audit fees (Y) using firm characteristics (X). Please use OLS. Use EDA as well as business judgment to identify the best set of X variables. In short, demonstrate skill in feature engineering. Demonstrate pandas skill and ability in data acquisition, data cleaning, data management, and analysis. Demonstrate advanced ability in reporting using a Jupyter notebook. Recall that an analytics report has many components (see spec. sheet for previous projects as well as the list below). You are expected to showcase increasing skill in reporting as you make progress in the course. An analytics report has many components such as: An introduction that discusses the scope of the analysis A description of data used in the analysis along with data cleaning procedures Code that clearly shows how an algorithm is implemented Results Discussion of results and generation of insight when appropriate Summary when appropriate Submission Submit a pdf as before. The total length should not exceed 10 pages.

assignment-4-ol-auditfees201019-lwtzvfch.csv assignment-4-ol-compustat201019-oats2gpm.csv

Amar Kumar · Accepted Answer

Introduction:
In this assignment, we will be exploring the relationship between audit fees (the Y variable) and financial characteristics of a firm (the X variables). We will be using data from two separate sources: AuditFees201019 and Compustat201019. AuditFees201019 contains information on audit fees from the Audit Analytics database, while Compustat201019 contains financial characteristics of firms from the Compustat Annual Industrial file.
Our goal is to identify and demonstrate a model that explains audit fees (Y) using firm characteristics (X). To achieve this, we will use OLS (ordinary least squares) from the statsmodels.formula.api package. OLS is a commonly used method for testing the relationship between two variables. By running the model Y = a + bX, we can estimate the parameters a and b, which can be used for predictions. We can also understand how well the model fits, or how much of the variance in Y is explained by a+bX.
However, selecting and measuring the X and Y variables properly is crucial. We need to consider issues such as whether the variables are scaled properly and how to select the X variables. Common sense and business knowledge can often guide us in the proper direction, but we also need to use exploratory data analysis (EDA) smartly.
Data Description:
We have two datasets for this project. The first dataset, AuditFees201019, contains information on audit fees paid by companies to their external auditors. The dataset has the following columns:
· FISCAL_YEAR: The fiscal year in which the audit was conducted.
· FISCAL_YEAR_ENDED: The fiscal year in which the company's financial statements were prepared.
· AUDIT_FEES: The audit fees paid by the company to its external auditor.
· AUDITOR_NAME: The name of the external auditor.
· COMPANY_FKEY: A unique identifier for each company.
· BEST_EDGAR_TICKER: The ticker symbol for the company.
The second dataset, Compustat201019, contains financial characteristics of firms. The dataset has the following columns:
· popsrc: Population source.
· datafmt: Data format.
· tic: Ticker symbol.
· conm: Company name.
· curcd: Currency code.
· act: Current assets.
· at: Total assets.
· ceq: Common equity.
· ebit: Earnings before interest and taxes.
· ebitda: Earnings before interest, taxes, depreciation, and amortization.
· emp: Number of employees.
· invt: Inventory.
· lct: Current liabilities.
· pifo: Property, plant, and equipment, net.
· exchg: Exchange code.
· costat: Active/Inactive status.
· fic:

Assignment: Data Management and Regression AnalysisConceptAn important way to test the relationship between two variables Y and X is to run the model: Y = a + bX using ordinary least squares (OLS)...

Assignment: Data Management and Regression Analysis

Concept

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment