.do files and log files.Tasks:
Example Stata Do File for Data Cleaning:
* Set the working directory
cd "C:/Users/yourname/Documents/DataAnalysis"
* Load the dataset
use "dataset.dta", clear
* Inspect the dataset structure
describe
* Check for missing values
misstable summarize
* Remove leading/trailing spaces in string variables
foreach var of varlist _all {
gen clean_`var' = trim(`var')
drop `var'
rename clean_`var' `var'
}
* Convert string variables to numeric (if needed)
encode variable_name, gen(variable_name_num)
* Check and handle missing values
replace variable_name = 0 if missing(variable_name)
* Save the cleaned dataset
save "cleaned_dataset.dta", replace
Tasks:
Example Stata Do File for EDA:
* Load the cleaned dataset
use "cleaned_dataset.dta", clear
* Summary statistics for numerical variables
summarize
* Frequency distribution for categorical variables
tabulate categorical_variable
* Histogram for numerical variable
histogram numerical_variable, bin(20) normal
* Scatter plot for relationship between two variables
scatter variable1 variable2