Note – this is NOT a data download, it is a highly post, with downloads enabling GIS professionals with ESRI’s ArcGIS Software to embed enhanced geocoding rules into ESRI software for improving matching rates.
LA County currently is able to match over 99.5% of incoming addresses. Part of this has to do with the high quality of the reference file that we manage through the Countywide Address Management System, but a large part of this success rate has to do with something that many people take for granted – the geocode matching algorithms.
Some of this is my memory from conversations with Peter Fonda-Bonardi so I will follow up with him to validate. I spoke about our programs and history a little while ago at the 1st International Geocoding Conference: I’ve attached the presentation materials here: Using GIS to improve Addressing (.pdf file)
In the mid-1980′s, LA County hired Matt Jaro to design and build a fuzzy matching system to automate the matching of misspelled names (MediCal and MediCare patients that hand’t registered correctly and therefore the County wasn’t able to get reimbursement for health services rendered). This has led to the recovery of millions of dollars in revenue for the County based upon their standalone version of the software running on a UNIX platform.
The outcome of this was a matching program called Automatch developed by MatchWare technologies (here’s an article written by Matt Jaro). This software was then licensed by ESRI to improve its matching at the time. Yes – LA County’s matcher is inside of ArcMap (but is being phased out starting with ArcGIS 10).
The team responsible for developing the tool, led by Wayne Bannister, Peter Fonda-Bonardi, Victor Chen, and Yoko Myers of LA County’s Internal Services Department Urban Research Section, are intimately familiar with the complex mathematics and pattern rules that make up the matching systems, and have tweaked them to be especially accurate in dealing with the strange addresses in LA County. Some examples include (these all exist!):
- W Avenue Q
- Avenue 23
- The Old Road
- Outer Traffic Circle
- Los Coyotes Diagonal
- Avenue of the Stars
Download the Matching rules
The zip file has two folders in it that match two folders in your ArcGIS installation location. Unzip folders, then copy the files in the zip file to the same folders in your ArcGIS Desktop Installation, in my case this was C:\Program Files\ArcGIS\Desktop10.0
- All files in the Geocode folder –> C:\Program Files\ArcGIS\Desktop10.0\Geocode
- All files in the Locators folder –> C:\Program Files\ArcGIS\Desktop10.0\Locators
There are five boxes:
- Input Address data – this is your reference data file (could be TIGER, etc)
- Input Address Fields. You need to fill this out correctly.
- Add the FID or OBJECTID field – ArcMap needs a unique identifier.
- Add ALL of the fields that make up the address. Sometimes this is one field (TIGER’s FULLNAME), but if you have multiple fields (PRETYPE, PREDIR, etc) you will need to add them all, and in the correct order. The standardizer will concatenate them before splitting them according to LA Countys’ rules. DO NOT include address number, zipcode, or city fields.
- Address Locator Style – By completing Step 1, you should see three new locators starting with “LA County” in this list. You need to know the type of your data to select the correct one:
- LA County Points with Zone – this is for address points
- LA County Streets with Zone DIME - this is the most common format (TIGER is one – it invented this format). If you have left and right house ranges (Look for 4 columns with address numbers in them).
- LA County Streets with Zone NICKEL – very uncommon (but LA County uses it) – each record is a single range (left or right side) – look for only 2 columns with address numbers)
- Output Address Fields – CLick on “Select All” then uncheck the HouseNum, HouseSuffix, and ZIP fields. Generally these already exist as separate fields in your dataset – no need to output them again.
- Output Address Data – select where you want the file to be output.
From the ArcToolbox “Geocoding Tools” you can get the “Create Address Locator”
There are five boxes:
- Address Locator Style – same as #3 in the last step – it has to match the reference dataset
- Reference Data – select the output from the standardizer.
- Field Map When you selected the Reference Data, all of the fields should be filled in.
- You may see a blank next to the “*House Number” or “House From Left” field name – you need to select the field or fields that contain that data in your reference file.
- You need to go to the bottom of the list and check the Zone fields – Make sure they are filled in with the correct Zone information (this could be zipcode but could also be a city).
- We generally create one locator for zipcode, one locator for legal city, and one locator for postal city (if your reference data has these fields) – then tie them together with a Composite Locator (which I won’t discuss here).
- Output address locator – this is the name and location of the address locator that you will use for geocoding. Make sure that it shows the zone information (e.g. Points_Zip vs. Points_LegalCity) so you know what it contains.
Click OK - the address locator will be created. Have fun geocoding!