How should international geographical addresses be stored in a relational database?

Database DesignStreet Address

Database Design Problem Overview


Given the task of storing international geographic addresses in a relational table, what is the most flexible schema? Should every part of the address be broken out into their own fields, or should it be more like free text?

Is there any sense in separating differently formatted address into different tables? For example, have a table for USAAddress, CanadianAddress, UKAddress...?

Database Design Solutions


Solution 1 - Database Design

I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).

On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:

  • Street Number - Numeric
  • House or Building Name - [VarChar - in the UK some houses/buildings are identified by name, not by number]
  • Street Number Suffix [VarChar, although in most cases, Char(1) would suffice]
    • A, B etc
  • Street Name [VarChar]
  • Street Type [VarChar or Int if you have a StreetTypes table]
    • So far, I've found 262 unique types in the English speaking world, there are likely more, and don't forget other languages i.e. Strasse, Rue etc.
  • Street Direction [VarChar(2)]
    • N, E, S, W, NE, SE, NW, SW
  • Address Type [VarChar or Int if you have an AddressTypes table]
    • PO Box
    • Apartment
    • Building
    • Floor
    • Office
    • Suite
    • etc...
  • Address Type Identifier [VarChar]
    • i.e. Box Number, Apartment Number, Floor Number remember apartment numbers and offices sometimes have alphanumeric info - like 1A
  • Local Municipality [VarChar or Int if you have a Municipalities table]
    • For instance, if your hamlet/village appears in the address before the town.
  • City/Town [VarChar or Int if you have a Cities table]
  • Governing District [VarChar or Int if you have a Districts table]
    • State (U.S.)
    • Province (Canada)
    • Federal District (Mexico)
    • County (U.K.)
    • etc...
  • Postal Area [VarChar]
    • Zip (U.S.)
    • Postal Code (Canada, Mexico)
    • Postcode (U.K.)
  • Country [VarChar or Int if you have a Countries table]

This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats

For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.

In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.

I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.

QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.

One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!

Solution 2 - Database Design

Be careful not to over-analyze address formats. When you do, you're quite likely to end up with a specification most users will need to work around, effectively forcing them to use the wrong fields, or only filling the primary fields and ignoring the extra fields.

Keep things simple.

A StreetType like mentioned by BenAlabaster will cause problems when you start working with languages different from isolating languages like English or Spanish.

To show you how bad things can get in the wild: the "Henriette Roland Holststraat" in Amsterdam, built up from "Henriette" + "Roland Holst" + "straat", which can be abbreviated as the "Roland Holststraat", or "Roland Holststr.", or misspelled as "H.R.Holststr." or "Henriette Roland-Holst straat", depending on the weather. Unless you've got an up-to-date street register for each country on earth, you'll be going nowhere.

And finally, be careful that in some multilingual countries, names can be different from one language to another! For instance in Brussels where many streets have both a French and a Dutch name: "Avenu du Port" and "Havenlaan", depending on the addressee's preferred language. (Google Maps shows both names alternately, just to be on the safe side.)

You can try to devise all kinds of clever tricks here, but are the sales reps. going to understand this?

Solution 3 - Database Design

That depends on what you want to do with it.

I've found it always easier to use addresses for other purposes (such as verification against USPS data or getting shipping rates from UPS/FEDEX) if they're separated.

Here's what I typically use for addresses:

  • Address Line 1
  • Address Line 2
  • Address Line 3
  • City
  • Region
  • Postal Code
  • County
  • Country

In Response to the edit: For most situations I don't see the use. The table I listed above has enough fields (and is generic enough) for most country's addresses.

Solution 4 - Database Design

Here's an anecdote for anyone who stumbles on this question:

I speak as a person who has lived and worked on a lot of continents (Europe, Asia, North America). In my experience, and the experience of the people I work with, it has been much easier for us to use systems that do the following:

  1. Provide three lines into which I will type one address. Pass these three lines on to your local postal service as I type them, verbatim. Let me use any character set I want; use UTF-8 or something better.
  2. If your system has business requirements that need me to specify particular information ( such as zip code, prefecture, state, etc. ), ask for that separately. By business requirements, I mean things like analytics; these bits of information should not be shared with your local postal service ( unless I also happened to write the same information into one of the three lines from Point 1, above ).
  3. Have a dropdown that asks me to specify the categorical location of address I provided in the lines of Point 1 above, perhaps Country.
  4. If you must parse the information I provide in the lines of Point 1, use my answer to Point 3 to select regex. Run that regex against the information in Point 1 to parse it. Try to fill the user interface elements of Point 2 using the output from your regex. If I correct that autofilled information--use the fact that I changed it to improve your regex. Similarly, as much as possible, give me an opportunity to review and correct the output of your regex: nobody knows better what I intended to communicate than me.

Systems built like this, I find, make my life easiest. Particularly when I'm sending mail to a postal system about which your firm has virtually no functional internal knowledge.

If your firm does have internal knowledge about particular postal systems, use my selection in Point 3 to inform which view you display to me. A lot of people know what the US postal system expects on packaging; if I select US in Point 3, feel free to make the view look appropriate for a US address. If I select a country about which your firm knows nothing--display a generic three lines and let me do the rest; don't force me to use ASCII.

And let's be real here--building a complete, encyclopedic database of all global postal systems ( public and private ) is a herculean task at best, if not an impossible one. There are, for example, postal systems in which only the local, last-mile carrier really knows where an address is located. Sometimes being able to pass notes to that carrier on the packaging is extremely useful. And mapping the local knowledge of every edge case carrier into your database is indeed an impossible task.

Just ask Gödel. ( And then ask yourself if you're attempting to use an axiomatic system to model a universe of discourse, give or take some sort of arithmetic like set theory or relational algebra. )

Solution 5 - Database Design

Address

As a polar opposite to the excellent answer @BenAlabaster has provided, you could simply have:

address       TEXT(300)
postal_code   VARCHAR(15)
country_code  VARCHAR(2)

Your client-side form layouts can still be as complex as you see fit (or use a multi-line input where the user can manually type their address). You can then add the line breaks in the address where necessary.

Country

Your country table would look as follows:

country_code  VARCHAR(2)
country_name  VARCHAR(255)

Additionally, you could have one of the following:

postal_code_required  TINYINT(1)
postal_code_regex     VARCHAR(255) NULL DEFAULT NULL

Then use the following lists to design your country table:

Solution 6 - Database Design

Comment of Ben Alabaster's Answer: To format addresses based on country, you could use a formatting table that has the ordering of the columns for each country as separate rows.

  • AddressFormat(CountryCode, FieldName, FieldOrder)

The field order can be coded to use complex grid layouts also.

There is no point in separating addresses by country. This will be chaotic as the number of countries increases and you will land in trouble if you want to find all the addresses of say, an international client. Having an Address Type suggested by Ben could also lead to ambiguities when you have an address that has both a building number and an apartment number. I could be in an apartment complex where each building has a different name. This is very common in India.

Solution 7 - Database Design

I use https://github.com/commerceguys/addressing library to format international addresses and they use these elements:

Country
Administrative area
Locality (City)
Dependent Locality (in: BR, CN, IR, MY, MX, NZ, PH, KR, ZA, TH)
Postal code
Sorting code
Address line 1
Address line 2
Organization
Recipient

This doen't help if you want to parse the street (name, house number, ...).

Btw. if you are looking for a multilanguage country list: https://github.com/umpirsky/country-list

Solution 8 - Database Design

The only way is to split them to:

Name varchar,
Title varchar,
StreetAddress varchar,
StreetAddressLine2 varchar,
zipCode varchar,
City varchar,
Province varchar,
Country lookup

since almost every country has it's own standard for having address data, and evey country has a different format of zipcodes.
You can have a small sample of problems in my post from a similiar question.

This should not make sense to separate addresses for every country, since there are countries where you have few address conventions. Some popular conventions include not having streets in small villages, only village name and number, while streets are in larger cities’ addresses. I have learned that in Hungary’s capital – Budapest, there are few streets having the same name (you distinct them by city’s district number), while other cities does not have such addresses (someone from Hungary may actually confirm if this is true). So the total number of address formats will be numer_of_countries multiplied by number of address formats in this country… Can be done with different tables, but it will be horrible work to do.

Solution 9 - Database Design

I know this is an extremely old topic that is already answered, but I thought that I'd throw my two cents in as well. It all depends on what your project goals and how you expect your target users to enter addresses. Ben's suggestion will allow you to parse addresses accurately, but on the other hand could make for a longer (and possibly more frustrating) user data entry process. Stephen Wrighton's suggestion is simpler, and could be easier for users to enter addresses as a result.

I've also seen some models that simply had an "Address" column that would capture a typical street number, type, street name, unit / apartment number, etc. all in one column, while keeping City, Country, Region, etc. within other columns. Similar to Stephen's model, except Address1, Address2, and Address3 all consolidated into one column.

My opinion is that the most flexible models tend to be those that are least restrictive, depending on your interpretation of flexible.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBobView Question on Stackoverflow
Solution 1 - Database DesignBenAlabasterView Answer on Stackoverflow
Solution 2 - Database DesignRubenView Answer on Stackoverflow
Solution 3 - Database DesignStephen WrightonView Answer on Stackoverflow
Solution 4 - Database DesignStudentsTeaView Answer on Stackoverflow
Solution 5 - Database Designrybo111View Answer on Stackoverflow
Solution 6 - Database DesignbkmView Answer on Stackoverflow
Solution 7 - Database DesignHarald ErnstView Answer on Stackoverflow
Solution 8 - Database Designsmok1View Answer on Stackoverflow
Solution 9 - Database DesignShan PlourdeView Answer on Stackoverflow