PROJECT DEMO: Top 5 Zip Codes

Top Five Zip Codes is a housing market prediction model that uses seasonal ARIMA time-series analysis and GridsearchCV to recommend the top 5 zip codes for purchasing a single-family home in Westchester, New York.

Interactive Dashboard:

To see the dashboard in action, go to RealtyRabbit.


Business Case

Recommend top 5 zipcodes for client interested in buying a single-family home in Westchester County, NY within the next two years.

Required Parameters (from Client)

  1. Timeframe: Purchase of new home would take place within the next two years
  2. Budget: Cost to buy not to exceed 800,000 USD maximum
  3. Location: Zip Code search radius restricted to Westchester County, New York

Ideal Parameters (from Client)

Success Criteria for Model

  1. Maximum ROI (highest forecasted home value increase for lowest upfront cost)
  2. Confidence Intervals
  3. Risk mitigation: financial stability of homeowners and homes based on historic data


Make forecast predictions for zip codes and their mean home values using Seasonal ARIMA time-series analysis.

Commute Times

Since commute time to Grand Central Station is part of the client’s required criteria, I first had to look up which towns/zip codes were on which train lines. Grand Central has 3 main lines on Metro North Railroad: Hudson, Harlem, and New Haven. The first question I was interested in answering was if the average home prices for the zip codes that fall under these geographic sections display any trends.

Mean Values by Train Line

Mean Values by Train Line Area Plot

New Haven line

New Haven Line Zip Code Timeseries


Note that this does not include zip codes in Connecticut (which the New Haven line covers) since the client is only interested in towns in New York state.

Harlem Line

Harlem Line Zip Code Timeseries

Hudson Line

Hudson Line Zip Code Timeseries


The top five I selected based on the above criteria were 10549, 10573, 10604, 10605, 10706:


The top five results that fit the required criteria were 10549, 10573, 10604, 10605, 10706:


timeseries 10549


timeseries 10573


timeseries 10604


timeseries 10605


timeseries 10706

Top Five Zip Codes in Westchester County

top five zipcodes timeseries


My client was keen on accounting for public school districts, which upon initial inspection would have required a great deal of manual plug and play. However, if there is an API or some other way to scrape this data from the web, I would definitely incorporate school districts as an exogenous factor for the sake of making recommendations for a client. Someone might actually not prefer schools with a rating of 10 as these tend to be predominantly all-white. My client in particular was looking for decent school districts below the 10-mark because she wants her child to grow up in a more ethnically-diverse community. Being able to account for such preferences would be part of the future work of this project.


You can reach me at


This project uses the following license: MIT License.

Your Browser Doesn't Support Canvas, Please Download Chrome or compatible browser.