Price Paid Linked Data

What does the Price Paid Dataset consist of?

HM Land Registry publish Price Paid Data for England and Wales on a monthly basis. New transactions are added to the existing price paid dataset which, in September 2015 contained over 20 million transactions. Price paid data is captured for single residential properties that were sold for value and lodged with LR for registration since 1995. Each transaction generates a number of triples resulting in a store that holds in excess of 400 million data items. Data held does not indicate the current value of any individual property, it is the price paid at the time of sale. Data is compiled from the price paid information supplied to HM Land Registry when a single residential property is sold for value.

From October 2015 we have added a new category of price paid data known as additional price paid data. This new data, captured since 14 October 2013, includes data about:

  • property transfers carried out under a power of sale
  • repossessions
  • buy-to-lets where they can be identified by a mortgage
  • transfers to non-private individuals
  • sales where the property type is classed as ‘Other’

Privacy

In accordance with our rules price paid information has been entered as part of the register since 1 April 2000, the decision to do this was approved by the Lord Chancellor after public consultation and debate. He concluded that the inclusion, and consequent availability, of this information was in the public interest. Privacy impact assessments for the release of this data are available through the HM Land Registry website. From the privacy impact assessment;

“HPPD {Historical Price Paid Data} comprises information dating back to 1995, and we considered whether a significant impact was caused by the fact that the register of title (which includes the price paid) was not open to public inspection prior to 2000. We concluded that the impact was minimal as the public are able to request copies of transfers under the Land Registration statutory regime and invariably these contain the price paid for the property.”

Which transactions are not included?

There are a number of reasons that transactions are excluded from the dataset;

  • sale of part or a share of a property
  • sale of right-to-buy properties
  • transfers following divorce or by way of gift or exchange
  • transfers under Compulsory Purchase Order or by Court Order
  • transfer of more than one property as part of a portfolio
  • first registration of leases for 7 years or less
  • Additional price paid transactions valued below £100

How is the Price Paid Dataset published?

The price paid dataset is available in several forms from HM Land Registry;

The price paid data set is also available in full as a 4* linked dataset which is free to use under the terms of the OGL.

Linked Data Overview

Each residential property transaction lodged with HM Land Registry, subject to the exclusions above, will generate an entry in the Price Paid Dataset. In each calendar month Land Registry extract new and amended transactions from the Price Paid Dataset, convert them into linked data form and apply them to the linked data store. This process runs alongside the normal publication of Price Paid Data which is released on the 20th working day of each month.

At the highest level each transaction record contains data that describes the property and data that describes the transaction. Within the Price Paid Dataset any individual property can have multiple transactions, each property is described with one set of triples based on the address information passed to HM Land Registry, each transaction has its own set of triples including links to the appropriate address triples.

Vocabularies

The publication of the Price Paid Dataset as linked data requires the creation of reusable RDFS or OWL vocabularies and two HM Land Registry vocabularies;

LR PPI http://landregistry.data.gov.uk/def/ppi
The HM Land Registry PPI vocabulary is used to describe the Price Paid Dataset produced each month. Historical note: initially this dataset was known as Price Paid Information (PPI) but confusion with other common uses of the acronym lead to a rename after the initial release of the dataset and publication of the vocabulary. The vocabulary contains terms that are used by Land Registry to describe things that are used within the price paid dataset.
LR Common http://landregistry.data.gov.uk/def/common
The HM Land Registry common vocabulary was created alongside the ppi vocabulary and contains terms that we may use in other linked data releases in the future that may not relate specifically to price paid data. Currently the vocabulary contains terms used to describe an address, at the time this vocabulary was created we could find no vocabulary that we could use to accurately describe the address information that we publish.

Transactions

Each transaction record will contain the following data items:

  • A transaction identifier
  • A transaction category (standard PPD transaction or additional PPD transaction)
  • The property address
  • The price paid
  • The date of the transaction
  • Information about the property

A standard PPD transaction concerns a single residential property sold for value to private individual. An additional PPD transaction concerns repossessions, buy-to-lets and transfers of property sold to non-private individuals.

The conversion process used takes these items and generates a number of ‘triples’ that are uploaded to a triple store that’s made available using a cloud based triple store. Descriptions of linked data formats are widely described in various web resources and are not covered here, HM Land Registry generate SPARQL update (.ru) files each month and these are described below.

To describe a transaction we have the following information:

A transaction record ID which contains

A transaction /def/ppi/hasTransaction
A property address /def/ppi/propertyAddress

The predicate hasTransaction has a range of Transaction,

The property type ppi:propertyType
The estate type ppi:estateType
Transaction date ppi:transactionDate
The price paid ppi:pricePaid
A new build indicator ppi:newBuild
The transaction category ppi:transactionCategory

The predicate propertyAddress has a range of Address.

Primary address (PAON) common:Paon
Secondary address (SAON) common:Saon
Street common:Street
Locality common:Locality
Town common:Town
District common:District
County common:County
County common:County
Postcode common:Postcode

Because of the age of some data within the triple store these items are all optional, logically every address should have at least a PAON and Postcode but there are instances within the store where this information was missing or captured incorrectly. In time we hope to address this issue.

Data Model

price paid data model
Figure 1 – PPD data model

Example

In December 2013 HM Land Registry received a transaction for

Flat 22
Orchard Building 25 Pear Tree Street
London
Islington
Greater London
EC1V 3AP

The property details supplied stated that this was a flat subject to a lease, the property was a new build and it sold for £860,000 on 31st October 2013. After transformation and upload to the HM Land Registry triplestore this property transaction can be found here

http://landregistry.data.gov.uk/data/ppi/transaction/5FC45A71-8E90-46A0-9CBB-0017294BBAFA/current

This information is held as follows

The property type ppi:propertyType common:flat-maisonette
The estate type ppi:estateType common:leasehold
The transaction date ppi:transactionDate 2013-10-31 xs:date
The price paid ppi:pricePaid 860000
A new build indicator ppi:newBuild true xs:boolean
A transaction category ppi:transactionCategory ppi:standardPricePaidTransaction

We also add:

The transaction ID ppi:transactionId 5FC45A71-8E90-46A0-9CBB-0017294BBAFA
The publication date ppi:publishDate 2014-01-23 xs:date
A record status ppi:recordStatus ppi:add

Prior to October 2015 there was also a ppi:publishDate property. This has since been removed.

The record status was part of the first release of price paid data and has since been made obsolete but remains in the store so that queries that included it will still run. All transactions will have a record status of ‘add’.

The triples generated for this transaction are:

<http://landregistry.data.gov.uk/data/ppi/transaction/5FC45A71-8E90-46A0-9CBB-0017294BBAFA/current>
a       ppi:TransactionRecord ;
ppi:estateType common:leasehold ;
ppi:hasTransaction

;
ppi:newBuild "true"^^xs:boolean ;
ppi:pricePaid 860000 ;
ppi:propertyAddress

;
ppi:propertyType common:flat-maisonette ;
ppi:publishDate "2014-01-23"^^xs:date ;
ppi:recordStatus ppi:add ;
ppi:transactionCategory  ppi:standardPricePaidTransaction ;
ppi:transactionDate "2013-10-31"^^xs:date ;
ppi:transactionId "5FC45A71-8E90-46A0-9CBB-0017294BBAFA"^^ppi:TransactionIdDatatype .

For the property address we have:

Primary address (PAON) common:paon ORCHARD BUILDING, 25 xs:string
Secondary address SAON common:saon FLAT 22 xs:string
Street common:street PEAR TREE STREET xs:string
Locality common:locality xs:string
Town common:town LONDON xs:string
District common:district ISLINGTON xs:string
County common:county GREATER LONDON xs:string
Postcode common:postcode EC1V 3AP xs:string

We also add:

The property address ID ppi:propertyAddress 8143663fb6492e899caae7ba3be160e7478b9792
The type of address common:BS7666Address

And the triples for this property address are:


  a       common:BS7666Address ;
  common:county "GREATER LONDON"^^xs:string ;
  common:district "ISLINGTON"^^xs:string ;
  common:paon "ORCHARD BUILDING, 25"^^xs:string ;
  common:postcode "EC1V 3AP"^^xs:string ;
  common:saon "FLAT 22"^^xs:string ;
  common:street "PEAR TREE STREET"^^xs:string ;
  common:town "LONDON"^^xs:string .

Note that no triple is generated for missing items, in this case the locality.

How to query the dataset

There are several ways to query the Price Paid Dataset;

The Price Paid Data Report Builder

The report builder is the easiest way to query the Price Paid dataset, it allows the easy creation of queries using a form and results can be downloaded in various forms. The report builder can also be used to generate SPARQL queries that can be amended and re-run as required.

SPARQL

Alongside the report builder we have created a SPARQL query page that can be found here http://landregistry.data.gov.uk/app/hpi/qonsole.

We also have made a SPARQL endpoint available: http://landregistry.data.gov.uk/landregistry/query which can be used to query against.

Included at the end of this document is an example of a SPARQL query using FILTER to restrict the results returned.

Via the API

It is possible to query the Price Paid Dataset directly through the API, this largely removes the need to know how to use SPARQL as long as the structure of the data is known. Using the example transaction above we know that there is a street called ‘Pear Tree Street’ and that street is part of the address within the PPI data structure. With that information we can enter this URL into a web browser

http://landregistry.data.gov.uk/data/ppi/address?&street=PEAR%20TREE%20STREET

Note that this can be used to return all transactions for any street of that name that are held in the Price Paid dataset but because there is a limit set on the API page for the number of items returned any download in another format, for example csv, will be limited to the same number of results as displayed.

Similarly other address items can be queried, in this example with a postcode for a different Pear Tree Street.

http://landregistry.data.gov.uk/data/ppi/address?postcode=DE23%208PL

And items can be combined, in this example using street and county:

http://landregistry.data.gov.uk/data/ppi/address?&street=PEAR%20TREE%20STREET&county=DERBYSHIRE

It is possible to return the number of rows that satisfy the query by using the count function

http://landregistry.data.gov.uk/data/ppi/address?&street=PEAR%20TREE%20STREET&county=DERBYSHIRE&_count=yes

Which reveals that there are 9 entries that satisfy that query. A query for the county of Derbyshire reveals that there are 177,264 transactions (both queries correct as of June 2014). Note that for some queries the count function has been disabled due to the size of the dataset, this is particularly relevant for queries at the transaction-record level, for example this query will return new build properties in Derbyshire adding the count function will result in a warning message.:

http://landregistry.data.gov.uk/data/ppi/transaction-record?&propertyAddress.county=DERBYSHIRE&newBuild=false

Example SPARQL query using ‘filter’

This is an example of a custom SPARQL query that we used. The customer request was to use a SPARQL query to find property where part of the PAON was Alveston within postcode CV37 7AE. This can be easily done using the Price Paid report Builder but using SPARQL can be more flexible when dealing with multiple PAONs.

The HM Land Registry Price Paid Report builder http://landregistry.data.gov.uk/app/ppd can be used to generate a base SPARQL query for the postcode which looks like this:

prefix rdf: 
prefix rdfs: 
prefix owl: 
prefix xsd: 
prefix sr: 
prefix lrhpi: 
prefix lrppi: 
prefix skos: 
prefix lrcommon: 

PREFIX  ppd:  
PREFIX  lrcommon: 

SELECT  ?item ?ppd_hasTransaction ?ppd_pricePaid ?ppd_propertyAddress ?ppd_publishDate ?ppd_transactionDate ?ppd_transactionId ?ppd_estateType ?ppd_newBuild ?ppd_propertyAddressCounty ?ppd_propertyAddressDistrict ?ppd_propertyAddressLocality ?ppd_propertyAddressPaon ?ppd_propertyAddressPostcode ?ppd_propertyAddressSaon ?ppd_propertyAddressStreet ?ppd_propertyAddressTown ?ppd_propertyType ?ppd_transactionCategory
WHERE
  { ?ppd_propertyAddress  _:b0 .
    _:b0  lrcommon:postcode .
    _:b0  _:b1 .
    _:b1  "( cv37 AND 7ae )" .
    _:b1  _:b2 .
    _:b2  200000 .
    _:b2   .
    ?item ppd:hasTransaction ?ppd_hasTransaction .
    ?item ppd:pricePaid ?ppd_pricePaid .
    ?item ppd:propertyAddress ?ppd_propertyAddress .
    ?item ppd:transactionDate ?ppd_transactionDate .
    ?item ppd:transactionId ?ppd_transactionId .
    ?item ppd:transactionCategory ?ppd_transactionCategory .
    OPTIONAL
      { ?item ppd:estateType ?ppd_estateType }
    OPTIONAL
      { ?item ppd:newBuild ?ppd_newBuild }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:county ?ppd_propertyAddressCounty }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:district ?ppd_propertyAddressDistrict }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:locality ?ppd_propertyAddressLocality }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:paon ?ppd_propertyAddressPaon }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:postcode ?ppd_propertyAddressPostcode }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:saon ?ppd_propertyAddressSaon }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:street ?ppd_propertyAddressStreet }
    OPTIONAL
      { ?item ppd:propertyAddress/lrcommon:town ?ppd_propertyAddressTown }
    OPTIONAL
      { ?item ppd:propertyType ?ppd_propertyType }
  }
LIMIT   100

The part of this query that is of most interest is the WHERE clause in the SELECT command, shown in bold in the example. This can be modified in several ways to achieve the required result, here is one example, by amending the last line before the OPTIONAL keyword and adding two lines a property name or partial name can be added to the SELECT.

Amend the following line, adding a period at the end separated from the last character by a space:

?item ppd:transactionId ?ppd_transactionId .

Add these two lines to follow:

?item ppd:propertyAddress/lrcommon:paon ?ppd_propertyAddressPaon
 FILTER regex(?ppd_propertyAddressPaon,'alve','i')

These lines identify PAON as part of the select and that the result should be filtered to find the string ‘alve’ anywhere within the PAON, the third argument specifies that the string is not case sensitive.

This gives a WHERE clause that looks like this and, when run, will return a single property transaction for Alveston Cottage

WHERE
  { ?ppd_propertyAddress  _:b0 .
    _:b0  lrcommon:postcode .
    _:b0  _:b1 .
    _:b1  "( cv37 AND 7ae )" .
    _:b1  _:b2 .
    _:b2  200000 .
    _:b2   .
    ?item ppd:hasTransaction ?ppd_hasTransaction .
    ?item ppd:pricePaid ?ppd_pricePaid .
    ?item ppd:propertyAddress ?ppd_propertyAddress .
    ?item ppd:transactionDate ?ppd_transactionDate .
    ?item ppd:transactionId ?ppd_transactionId .
    ?item ppd:transactionCategory ?ppd_transactionCategory .
              ?item ppd:propertyAddress/lrcommon:paon ?ppd_propertyAddressPaon
            FILTER regex(?ppd_propertyAddressPaon,'alve','i')

More than one string can be searched for using || to separate each argument, to return PAONs that contain the string ‘alve’ or the number 5 (stored as a string not numeric):

?item ppd:propertyAddress/lrcommon:paon ?ppd_propertyAddressPaon
FILTER (regex(?ppd_propertyAddressPaon,'alve','i') ||
             regex(?ppd_propertyAddressPaon,'5','i'))

This will return two rows, Alveston Cottage and 5A Tiddington Road

Found a problem or have a suggestion? Your feedback will help us to improve this service.