Seattle, Washington, United States
Ekata (formerly Whitepages Inc.) is a global leader in digital identity verification. Our solutions include Whitepages Premium for consumers and Whitepages Pro for businesses. Whitepages Premium provides subscribers access to U.S. public records to verify contact details, mobile numbers, bankruptcy history, criminal records, and more to facilitate trusting interactions in today’s sharing economy. Whitepages Pro provides businesses with global identity verification solutions via enterprise-scale APIs and web tools to help companies identify the legitimate customers from the fraudulent ones. Our foundation is data at massive scale. Our open search web properties serve 55 million visitors per month and account for more than 90% of free people searches in North America. To support these users, Whitepages has developed its own fully-integrated, high-availability Identity Graph database which houses more than 5 billion global identity records. These records have been curated and corroborated from hundreds of different sources and made available to our users to deliver unparalleled coverage, accuracy, and performance. Whether you are looking up an old friend, checking out someone you’ve met online, or powering a fraud solution for your enterprise, Whitepages can help you verify identities worldwide. Combine all this with a dynamic, can-do culture, and Whitepages is a pretty awesome place to work for folks who want to have impact. We are a small team with a passion for what we do, and we keep our employees at the center of our mission. We host weekly events, including catered lunches and happy hours, enjoy unlimited vacation, keep a fully stocked kitchen, and work in some great locations, headquartered in downtown Seattle, with offices in New York City and Budapest, Hungary. If this sounds like the kind of place you want to spend your days, then visit us at: http://about.whitepages.com/. Whitepages Inc. prides itself on being an equal-opportunity employer.
I was the principal software engineer for the data science team at nPario where we built the analytics engine for nPario's SaaS consumer intelligence platform. The platform is powered by a MPP shared-nothing columnar database (used to be Yahoo's Everest) and our goal was to integrate the core analytic processing within the database as much as possible so as to leverage its massively parallel query processing engine and ability to scale. We built a big-data, scalable intelligence/analytics engine that allows customers to better segment, analyze and understand their audience. We used distributed machine learning/data mining algorithms like association rule mining, k-means to build predictive models that infer interest and intent from large, sparse and noisy data sets with customer behaviors. We also used text clustering algorithms (LDA) to improve audience segmentation for publishers. We initially prototyped proof-of-concept models using R and then used C++ to implement these algorithms as part of the analytic engine. I was the also the lead developer for nPario's Real-Time Audience Classification Engine (RACE). RACE, also known as Segment Server, works in tandem with the Tagging engine to do audience segmentation in pseudo real-time. The engine was written in C++ and used UNIX IPC domain sockets to communicate with the Java based tagging server over JNI. It uses both models as well as historical data for segmentation and I wrote some bash and python scripts for updating the data and models. I implemented a couple of tools in Java for the application team to translate user security at the application level to the database level, which had its own row level security mechanism. I also helped the application team standardize their build process around Maven with Apache Archiva as the internal repository working in conjunction with the Jenkins build system.