Introduction
The housing market’s impact on national well-being is well recognized. Rising rental fees and housing prices leave many without adequate shelter. This analysis delves into housing prices in Mexico, Argentina, and Brazil. It focuses on the influence of location and size on affordability.
Data Collection and Cleaning
The dataset was obtained from [source]. It includes housing information collected over six months in 2017. After data cleaning, the dataset included:
- 42,822 entries for Argentina
- 45,415 entries for Brazil
- 67,949 entries for Mexico
Outliers were removed, as demonstrated in Figure 1.
Currency conversion to USD ensured consistency across the dataset. A new column for price in USD was created. Irrelevant columns were dropped. Data cleanup removed zero-sized spaces and outliers in the total area column.
Objectives
This analysis aims to address key questions related to housing affordability. The focus is on state-level price variations. We aim to identify the most expensive states for home purchases. Descriptive statistics such as mean and median were utilized. This provided insights into typical house prices in each country.
State-Level Analysis
State-level housing price analysis revealed intriguing findings. In Brazil, Mato Grosso was the most expensive state. This was contrary to expectations favoring Sao Paulo. Distrito Federal had the highest housing prices in Mexico. Buenos Aires’ division into multiple states added complexity in Argentina. Bs.As. G.B.A. Zona Norte emerged as the leading state based on mean housing price.
Two metrics were used to compare housing prices: price per square meter (price_usd_per_m2) and approximate price in USD (price_aprox_usd). Comparing these metrics revealed distinct variations in the ranking of expensive states across the three countries (Figure 2).
Property Type Analysis
The dataset included four main property types: store, house, apartment, and PH (propiedad horizontal or condominiums). Stores were the most expensive in both Argentina and Brazil, aligning with their commercial use. However, in Mexico, condominiums (PH) dominated despite a lower average price compared to other property types.
A breakdown of property counts by state showed:
- Argentina and Brazil: Apartments were dominant, followed by houses.
- Mexico: Houses were more common, followed by apartments.
- Argentina: Highest number of condominiums.
- Brazil: Most stores.
Interestingly, in Mexico, condominiums were the most expensive but least common compared to other property types.
Here are some statistics from the analysis:
Argentina:
- Store: Average price approx. USD 319,375.48
- House: Average price approx. USD 315,539.53
- Apartment: Average price approx. USD 196,222.30
- Condominium (PH): Average price approx. USD 147,595.34
Brazil:
- Store: Average price approx. USD 419,694.83
- House: Average price approx. USD 276,753.90
- Apartment: Average price approx. USD 227,955.70
- Condominium (PH): Average price approx. USD 199,683.67
Mexico:
- Condominium (PH): Average price approx. USD 561,870.66
- Apartment: Average price approx. USD 298,965.61
- Store: Average price approx. USD 226,523.02
- House: Average price approx. USD 212,121.66
Spatial Distribution
Spatial distribution was visualized using 3D scatter plots. These plots highlighted housing concentration in specific areas. Mexico displayed the least spatial distribution. Argentina and Brazil exhibited more dispersed housing patterns (Figures 3 and 4).
Modeling and Predictions
In our quest to predict future housing costs, we built a robust model. It was based on various features such as property type, location, and size. The dataset was split into training, validation, and testing sets. We evaluated the model’s performance using mean absolute error (MAE) and mean absolute percentage error (MAPE). The results showcased high accuracy in predicting house prices, with MAPE consistently below 5%.
Data Splitting and Feature Selection
The data was split into training, testing, and validation sets using an 80%-10%-10% ratio. Selected features included: property_type, place_name, state_name, surface_total_in_m2, price_usd_per_m2, price_aprox_usd, lat, lon
Baseline Model Evaluation
The initial model evaluation involved computing the mean absolute error for each dataset. This served as our baseline model. The results revealed relatively high MAE values for Argentina and Mexico but a lower MAE for Brazil.
Fitting the Model
The Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were considered to put the house price predictions into perspective:
Argentina:
- Mean house price: $1637.82
- Baseline MAE: $779.73
- Baseline MAPE: 47.61%
Brazil:
- Mean house price: $1925.75
- Baseline MAE: $438.68
- Baseline MAPE: 22.78%
Mexico:
- Mean house price: $863.87
- Baseline MAE: $410.77
- Baseline MAPE: 47.55%
The error in predictions was highest in Argentina, relatively lower in Brazil and Mexico.
After fitting the model and making predictions, the training results were analyzed:
Argentina:
- Training MAE: $402.04
- Training MAPE: 60.83%
Brazil:
- Training MAE: $571.21
- Training MAPE: 30.60%
Mexico:
- Training MAE: $293.89
- Training MAPE: 51.21%
Advanced Modeling with Ridge Regression
We used Ridge regression for advanced modeling. Extracted features and target variables for each country were utilized. Intercepts and coefficients identified critical features in predicting house prices.
Feature Importance Analysis
Feature importance analysis showcased top-ranking features. It highlighted their positive and negative impacts on house prices in each country. We evaluated our models on the validation data. The mean absolute percentage errors remained consistently within an acceptable range of 2%. This excellent generalization performance reaffirmed the effectiveness of our models.
User-Friendly Functions
We developed functions that enable individuals to predict house prices. Users can provide inputs for surface_total_in_m2, lat, lon, place_name, and property_type. Graphical representations of feature importance through bar charts were provided. These allow users to visualize the significance of each feature in predicting housing costs.
Conclusion
This comprehensive analysis examined housing markets in Mexico, Argentina, and Brazil. We focused on the impact of location and property size on housing affordability. Through meticulous data cleaning and analysis, we uncovered intriguing state-level variations in housing prices and property types in each country.
Advanced modeling techniques provided high accuracy in predicting future housing costs. The models’ performance, evaluated through MAE and MAPE, underscored their reliability.
The findings offer actionable insights for understanding housing markets’ dynamics in these nations. Stakeholders, including policymakers, real estate professionals, and homebuyers, can make informed decisions. This analysis sheds light on housing preferences, regional disparities, and economic factors shaping the real estate markets. It serves as a foundation for further research and a valuable resource for shaping housing policies and strategies.
The project is available for review on my Kaggle account. The data used for this analysis was obtained from [source]. We acknowledge the creators and contributors of this dataset for making it available for analysis.
By Simon Peter Mulima
One Response