Skip to content

Nice‐to‐have: Project idea list

Angus edited this page Jan 14, 2025 · 1 revision

These are projects I haven't been able to do yet because I'm still trying to add more data sources.

  • Land value to density map
  • Land value to equity map

Factor in the zoning when comparing sites

It probably ties into #9, but it would be good to find a way to compare land independent of zoning

Might be worth reading

Factor marginal value of land into comparisons

So far I've identified a means to accomplish this from page 8 of the Zoning Effect paper. Later i'll look into a means to accomplish this.

About

What i really want is a means is to compare value of land on an site independently of the size of the site is attached too. Which should help make visualisations within an LGA where people want to live based on the assumption the land value has priced all preferences in.

Why have i sought this out

Here is the sites in the CBD ranked by $ per sqm, look at the area of these sites.

image

While there are cases of data issues in the valuer general in the data, when you sort the sites within in sydney by land value by SQM, it seems like smaller sites tend to rank higher on a basis sqm basis.

I think this is because there's a marginal rate at which land increase in value where the next meter will be worth than the last. As you increase the size of land more types of projects become viable, but any meter you add after the earlier meters does nothing for the projects that viable regardless these extra meters.

Why does this bother me

If we had the shapefile for every lot this wouldn't actually be a problem, but because we are aggregating by meshblock, a single Telstra phone booth can inflate the aggregated value of the meshblock if it's something like max, or mean or something.

If you wanted to compare the value of land in Sydney independently of size of the lot, and you're grouping by things like meshblock. Aggregations like max or mean will be heavily skewed by this if you have a random 1x1 Telstra phone booth, which is the case in a few places in the CBD.

It would be nice to be able to weigh each meter independent of the size of the loot it's actually too, but maybe that is fanciful.

Hacky solutions

So far I've included something like this in my land value aggregations, but it's fairly arbitrary and dishonest

CASE
  WHEN p.area < 10 THEN 10
  ELSE p.area
END

Note, this isn't used in the data ingestion process but instead in the other notebooks where I've been trying to visualise the data.

Possible projects once accomplished

  • This will allow for the creation of a visualisation of different LGAs that show where the most valuable land is that isn't skewed by small sites.
  • Maybe get a distribution of land values in an area by doing following for each site
    def marginal_value_of_land(valuation, nth_meter):
        # The paper says this
        #    log(sale price) = c + b log(land area) + aX + e
        #
        # it would be neat to do something like
        #    (b log(nth_meter)) - (b log(nth_meter - 1))
        #
        # I don't even know if it makes sense to do that... 
        # Possibly useless. maybe this is more reasonable 
        #    b log(1)
        pass
    
    def population_of_all_land_values(valuations):
        for v in valuations:
            for nth_meter in range(0, v.sqm_area):
                yield marginal_value_of_land(v, nth_meter)
    With that population of land values you can see the distribution of land values, I'm honestly unsure what the most sensible way to do this is...

Solutions

  • It's possible this methodology for comparing land by these aggregations is flawed and I should look at other methodologies.
  • It's possible there some kind of coefficient you can figure out from hedonic pricing models?
  • this problem is a problem because I'm aggregating multiple properties by mesh blocks, if I had the shape files for the actual properties this wouldn't be a problem

It's entirely possible I'm looking at this all wrong, I think first, it's best to establish a better understanding of the nature of things first before proposing a fix. Let's see what research says about it.

Consider reading the RBA paper, on Zoning Effect

Notes Reading Paper "The effects of Zoning on Housing prices"

  • 20240912
    • page 2, there's immediate mention of a "marginal value of land"
    • page 5, mentions it's worth noting since lands may be deflated as sometimes land owners despite the high valuations of their land by not lower.
    • page 8 (web link), here marginal value of land is explicitly mentioned.
      • this relation was shown log(sale price) = c + b log(land area) + aX + e
      • Is it possible to use this with substitution to get the marginal value of land?

Helper for rendering "zoning"

It would be great to have some helper functions to render the zoning filed in the mesh blocks. It's not as accurate as actually planning data from each state, but it could be to do in the mean time. And I'm sure the matplotlib code can be reused for the actual zoning stuff.

Potential issues

The mb_cat filed in mesh blocks isn't great, but it's kind of fun to do. But the long term solution is figuring out how to map the zoning from the valuer general data on too these shape files, or shape files we end up using for properties.

Long term options

  • #13
  • #8

Example

You could do something like this. Fetch data like this

    SELECT mb.mb_cat, mb.geometry as geom
      FROM non_abs_main_structures.lga_2024 lga
      RIGHT JOIN abs_main_structures.meshblock mb ON ST_Intersects(mb.geometry, lga.geometry)
      WHERE lga.lga_name ILIKE 'Sydney'
        AND (ST_Area(ST_Intersection(lga.geometry, mb.geometry)) / ST_Area(mb.geometry)) > 0.1

Then use helpers like this to render the legend and plot.

from collections import defaultdict

mb_cat_facecolor = {
    'Commercial': '#0000ff',
    'Residential': '#ff9999',
    'Education': '#66ff66',
    'Hospital/Medical': '#9933ff',
    'Industrial': '#ffff00',
    'Parkland': 'white',
    'Water': 'white',
    'Transport': 'white',
    'Other': 'white',
}

mb_cat_edgecolor = {
    'Commercial': None,
    'Residential': None,
    'Education': 'white',
    'Hospital/Medical': 'white',
    'Industrial': 'black',
    'Parkland': '#00ff00',
    'Water': '#0000ff',
    'Transport': '#ff0000',
    'Other': 'black',
}

mb_cat_hatch = defaultdict(lambda: None, {
    'Education': '//',
    'Hospital/Medical': '//',
    'Parkland': '//',
    'Water': '//',
    'Transport': '//',
    'Other': '//',
})

def render_zones(ax, df, col):
    cats = list(df[col].unique())
    for c in cats:
        df[df[col] == c].plot(
            ax=ax,
            facecolor=mb_cat_facecolor[c],
            edgecolor=mb_cat_edgecolor[c],
            hatch=mb_cat_hatch[c],
            alpha=mb_cat_alpha[c],
        )

def render_zoning_legend(ax, df, col):
    from matplotlib.patches import Patch
    cats = list(df[col].unique())
    ax.legend(
        handles=[
            Patch(label=c,
                  facecolor=mb_cat_facecolor[c],
                  edgecolor=mb_cat_edgecolor[c],
                  alpha=mb_cat_alpha[c],
                  hatch=mb_cat_hatch[c])
            for c in cats
        ],
        title='Zones',
        loc='lower left',
        fontsize='large',
        title_fontsize='x-large',
    )
image