Purpose of This Methodology Module
This module documents the structural data biases inherent in a listing-based residential dataset for Johannesburg. Its purpose is to make explicit the systematic distortions that arise from how residential information becomes observable. The module does not attempt to correct or quantify bias, but to define clear interpretation boundaries for institutional users.
Bias Introduced by Voluntary Participation
Residential listings enter the dataset through voluntary participation in formal publication channels. This creates an immediate selection effect, as only properties marketed through these channels become visible. Residential units managed outside formal brokerage systems or platform-based marketing remain structurally absent, regardless of their prevalence or occupancy.
Publication and Rotation Bias
Repeated publication of the same or similar residential units introduces rotation bias. Units that are relisted frequently can appear multiple times across observation periods, amplifying their visibility relative to less frequently published properties. This bias affects perceived activity levels without altering the underlying residential structure.
Spatial and Boundary Bias
Listing visibility is unevenly distributed across the city due to differences in development form, management structure, and platform categorization. Recognizable districts and nodes are often overrepresented because listings are preferentially assigned to well-known locations. Conversely, areas with diffuse or informal residential patterns tend to be underrepresented.
Housing Form and Management Bias
Multi-unit developments and centrally managed properties are more consistently represented in listing datasets due to standardized marketing processes. Low-density, individually managed, or informally occupied housing forms generate inconsistent or minimal visibility. This creates a structural skew toward certain housing formats.
Interpretation Boundaries Resulting From Bias
These structural biases limit the interpretive reach of the dataset. Observed patterns should be read as artifacts of publication systems rather than as comprehensive representations of residential conditions. This module establishes bias awareness as a prerequisite for all subsequent city, submarket, and district-level readings.
