Los Angeles Metropolitan Area
While the shear volume of data has exploded over the last decade (with some estimating that 90% of the worlds data is less than two years old), better understanding has not always followed. It is increasingly clear that data curation, transformation, and modeling are key to successfully solving real problems with data. I use data to produce actionable insights for science and industry. From a first glimpse of galaxy morphologies in the early universe, to transforming drone regulations onto maps, or aggregating and modeling automotive pricing, I have successfully brought deeper understanding from rich and complex data sources.
We use data to determine the carbon footprint of your personal care products.
The retail industry and municipal waste facilities process millions of lbs of unused and unwanted products every year, much of which ends up in landfills or incinerators. At SmarterSorting we seek to pull valuable items out of the waste stream and find useful homes for them. For instance, we believe that the best way to dispose of paint is to paint a wall. The best way to dispose of a damaged bag of fertilizer is to use it to plant something. To accomplish our goals, we must understand the chemical nature of these products and ensure that they are transported and stored safely and legally. Our data team captures multiple observations of hundreds of product facets for millions of products. We use this rich data set to determine the true nature of each product and categorize it for proper handling given national, state, and local regulations.
Principal Investigator, Local Group Infrared Cluster Survey (LoGICS), Caltech (2010-12) I have assembled a team from 7 institutions with the goal of tracking stellar evolution in star clusters. After several successful proposals we have recently obtained our first data. Research Scientist, Bootes Research Team, Caltech (2007-2012) Led five publications on the near-to-far infrared properties of galaxies. I organize monthly telecons, set the agenda for future science and proposals, and present results at conferences.
The AirMap Data team captures the world's drone regulations, and encodes them onto human and machine-readable maps. We also capture key elements of the physical world: topography, obstacles, events, and wireless signal-strength, to power human-piloted and autonomous drone flights. Our rules engine evaluates drone flight plans for legality, safety, and efficiency. The AirMap data platform empowers airspace-administrators, thousands of drone pilots, and hundreds of app developers worldwide, to open the airspace for drone flight.
The public face of TrueCar is the price curve, transaction histogram, and price trend shown for every vehicle on the site, e.g., https://www.truecar.com/prices-new/toyota/camry-pricing/. I was both the tech lead and one of the primary big data developers for the new localized pricing shown on the site today. ● Used Hadoop HDFS to store and enrich three plus years of vehicle transactions that we modeled in Avro format ● Used Java MapReduce to aggregate transactions across facets of time, space, and vehicle properties ● Implemented a linear regression model with pluggable features to calculate the final price estimates ● Published the pricing documents to Elasticsearch (ES) for consumption by APIs and analytics ● Built dashboards in Kibana to track transaction history, pricing model quality, and basic stats about the pricing documents ● Used the ES documents, Kibana dashboards, and Pig for data validation to surface issues that were missed in unit tests ● Legacy SAS/SQL version took ~17 hours to complete and was run once a week, now completes ~30 min in and it runs once a day ● Provides truly local pricing for first time
During the summer of 2015, I mentored a group of six brilliant summer interns. This team utilized TrueCar’s massive vehicle image collection and implemented advanced computer vision algorithms, to answer the following questions: (1) where is the car, (2) is it the interior or exterior of the car, (3) what angle is the car positioned, (4) is there an advertisement on the image, and finally (5) what is the make and model of the car. ● Implemented OpenCV, GrabCut, to segment the image and pull out vehicle ● Implemented SIFT and SURF algorithms to extract basic image features, e.g., gradients and angles ● Used Bag-of-words framework with k-means clustering to group and ultimately classify images as exterior full, exterior detail, or interior shots. ● Used AWS GPUs and CNNs for specialized classifications o Start with Google trained image classification CNN o Retrain final layer of the CNN using large TrueCar imaging set o Classify position angle, accuracy ~90% o Classify make/ model ,accuracy ~85%, ~20% better than best published academic result, largely because of size of training set. ● Used Google Tesseract optical character recognition (OCR) and a library of words to determine if ads are on the images.