Noah CallawayZac Fleischmann Zak Nelson Brandon Zahl Apartment Cloud
Aspirations / Reality Aggregate apartments listings from all across the internet to create a… …simple, one-stop, apartment search Aggregate apartment listings from top sites. (Washington state only) …mostly one-stop apartment search. …mostly simple.
Building It Brandon – Site specific extractors Statistics Noah –Server configuration Front-end development Zac – Site specific extractors Advanced Search Zak – Crawler / Aggregator Commute distance feature
Page Extraction Statistics Extractor NameFiles CrawledListings Found Extraction Errors % error- free Rent.com ApartmentRatings.com Craigslist.com MyNewPlace.com
Extraction Accuracy Statistics Extractor NameTPTNFPFNPrecisionRecallF-score Rent.com ApartmentRatings.com Craigslist MyNewPlace.com
Experiment Conclusion Much higher accuracy on the structured pages versus unstructured craigslist Craigslist is candidate for machine learning Machine learning likely worse on others
What we learned How to configure Amazon Web Services with a LAMP stack How to create a web application with AJAX How to use Jobo and Nutch for web crawling How to parse HTML for pertinent data The considerations of starting a web business
Unexpected Outcomes Amazon Web Services was slower than a $7/month virtual server Most of the large listing sites were surprisingly easy to extract data from Aggregating information from the web is legally tricky
Things We’d Do Differently Better version control More pre-coding design More quality control and testing More extensible extractors (Maybe an existing HTML parser)
Demo