Introduction

The housing market’s impact on national well-being is well recognized. Rising rental fees and housing prices leave many without adequate shelter. This analysis delves into housing prices in Mexico, Argentina, and Brazil. It focuses on the influence of location and size on affordability.

Data Collection and Cleaning

The dataset was obtained from [source]. It includes housing information collected over six months in 2017. After data cleaning, the dataset included:

Outliers were removed, as demonstrated in Figure 1.

Distributions-of-houses-and-prices
figure 1

Currency conversion to USD ensured consistency across the dataset. A new column for price in USD was created. Irrelevant columns were dropped. Data cleanup removed zero-sized spaces and outliers in the total area column.

Objectives

This analysis aims to address key questions related to housing affordability. The focus is on state-level price variations. We aim to identify the most expensive states for home purchases. Descriptive statistics such as mean and median were utilized. This provided insights into typical house prices in each country.

State-Level Analysis

State-level housing price analysis revealed intriguing findings. In Brazil, Mato Grosso was the most expensive state. This was contrary to expectations favoring Sao Paulo. Distrito Federal had the highest housing prices in Mexico. Buenos Aires’ division into multiple states added complexity in Argentina. Bs.As. G.B.A. Zona Norte emerged as the leading state based on mean housing price.

Two metrics were used to compare housing prices: price per square meter (price_usd_per_m2) and approximate price in USD (price_aprox_usd). Comparing these metrics revealed distinct variations in the ranking of expensive states across the three countries (Figure 2).

Mean house price
figure 2

Property Type Analysis

The dataset included four main property types: store, house, apartment, and PH (propiedad horizontal or condominiums). Stores were the most expensive in both Argentina and Brazil, aligning with their commercial use. However, in Mexico, condominiums (PH) dominated despite a lower average price compared to other property types.

A breakdown of property counts by state showed:

Interestingly, in Mexico, condominiums were the most expensive but least common compared to other property types.

Here are some statistics from the analysis:

Argentina:

Brazil:

Mexico:

Spatial Distribution

Spatial distribution was visualized using 3D scatter plots. These plots highlighted housing concentration in specific areas. Mexico displayed the least spatial distribution. Argentina and Brazil exhibited more dispersed housing patterns (Figures 3 and 4).

regional distribution of houses
figure 3
spatial distribution of houses
figure 4

Modeling and Predictions

In our quest to predict future housing costs, we built a robust model. It was based on various features such as property type, location, and size. The dataset was split into training, validation, and testing sets. We evaluated the model’s performance using mean absolute error (MAE) and mean absolute percentage error (MAPE). The results showcased high accuracy in predicting house prices, with MAPE consistently below 5%.

Data Splitting and Feature Selection

The data was split into training, testing, and validation sets using an 80%-10%-10% ratio. Selected features included: property_type, place_name, state_name, surface_total_in_m2, price_usd_per_m2, price_aprox_usd, lat, lon

Baseline Model Evaluation

The initial model evaluation involved computing the mean absolute error for each dataset. This served as our baseline model. The results revealed relatively high MAE values for Argentina and Mexico but a lower MAE for Brazil.

Fitting the Model

The Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were considered to put the house price predictions into perspective:

Argentina:

Brazil:

Mexico:

The error in predictions was highest in Argentina, relatively lower in Brazil and Mexico.

After fitting the model and making predictions, the training results were analyzed:

Argentina:

Brazil:

Mexico:

Advanced Modeling with Ridge Regression

We used Ridge regression for advanced modeling. Extracted features and target variables for each country were utilized. Intercepts and coefficients identified critical features in predicting house prices.

Feature Importance Analysis

Feature importance analysis showcased top-ranking features. It highlighted their positive and negative impacts on house prices in each country. We evaluated our models on the validation data. The mean absolute percentage errors remained consistently within an acceptable range of 2%. This excellent generalization performance reaffirmed the effectiveness of our models.

User-Friendly Functions

We developed functions that enable individuals to predict house prices. Users can provide inputs for surface_total_in_m2, lat, lon, place_name, and property_type. Graphical representations of feature importance through bar charts were provided. These allow users to visualize the significance of each feature in predicting housing costs.

A graph showing Importance-of-state-in-the-model-for-Argentina
Importance of state in the model for Argentina figure 5
Sample prediction results for Brazil figure 6
Sample prediction results for Brazil figure 6

Conclusion

This comprehensive analysis examined housing markets in Mexico, Argentina, and Brazil. We focused on the impact of location and property size on housing affordability. Through meticulous data cleaning and analysis, we uncovered intriguing state-level variations in housing prices and property types in each country.

Advanced modeling techniques provided high accuracy in predicting future housing costs. The models’ performance, evaluated through MAE and MAPE, underscored their reliability.

The findings offer actionable insights for understanding housing markets’ dynamics in these nations. Stakeholders, including policymakers, real estate professionals, and homebuyers, can make informed decisions. This analysis sheds light on housing preferences, regional disparities, and economic factors shaping the real estate markets. It serves as a foundation for further research and a valuable resource for shaping housing policies and strategies.

The project is available for review on my Kaggle account. The data used for this analysis was obtained from [source]. We acknowledge the creators and contributors of this dataset for making it available for analysis.

By Simon Peter Mulima

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *