Research Overview

My research broadly focuses on the development of data-driven optimization methodologies by leveraging advanced machine learning techniques. I mostly work with industry partners to solve complex business challenges with a combination of advanced machine learning and established operations management techniques.

Below is a list of my working papers and publications:

Data-driven Consumer Debt Collection via Machine Learning and Approximate Dynamic Programming (working paper), SSRN link

Working paper, with Ruben van de Geer and Sandjai Bhulai

Summary: We apply machine learning and approximate dynamic programming to help a debt collection agency optimize its collection process. Using data recorded from its historical collection interactions and outcomes we develop a method to intelligently select which debtors the collection agency should call for a given day. We implemented this method at an industry partner and conducted a controlled field experiment. Results of the experiment show a relative increase of 14% in collected debt and a decrease of 22% in calling effort when using our method as compared to the partner's current collection process.

Abstract: This paper presents a framework for the data-driven scheduling of outbound calls made by debt collectors. These phone calls are used to persuade debtors to settle their debt, or to negotiate payment arrangements in case debtors are willing, but unable to repay. We determine on a daily basis which debtors should be called to maximize the amount of delinquent debt recovered in the long term, under the constraint that only a limited number of phone calls can be made each day. Our approach is to formulate a Markov decision process and, given its intractability, approximate the value function based on historical data through the use of state-of-the-art machine learning techniques. Specifically, we predict the likelihood with which a debtor in a particular state is going to settle its debt and use this as a proxy for the value function. Based on this value function approximation, we compute for each debtor the marginal value of making a call. This leads to a particularly straightforward optimization procedure, namely we prioritize the debtors that have the highest marginal value per phone call. We validate our proposed methodology in a controlled field experiment conducted with real debtors. The results show that our optimized policy substantially outperforms the current scheduling policy that has been used in business practice for many years. Most importantly, our policy collects more debt in less time, whilst using substantially less resources—leading to a large increase in the amount of debt collected per phone call.

Keywords: Debt collection, approximate dynamic programming, machine learning

Presented at: POMS International Conference 2017 (Sydney, Australia), POMS-HK Conference 2018 (Hong Kong), StochMod 2018 (Lancaster, UK), INFORMS MSOM Conference 2018 (Dallas, TX), INFORMS Annual Meeting 2018 (Phoenix, AZ).

Blog post link: Data-Driven Debt Collection Using Machine Learning and Predictive Analytics

Optimal Contact Center Staffing and Scheduling with Machine Learning (working paper), Paper link

Working paper, with Siqiao Li and Ger Koole

Summary: We present a simulation-based machine learning framework to optimize staffing and scheduling for multi-skill call centers. A fundamental challenge in staffing and scheduling of service systems is ensuring certain quality of service (QoS) targets at minimum costs. This challenge is particularly complex when considering modern call centers that have multi-skill agents and multi-class customers with heterogeneous arrival rates, resulting in the lack of closed-form expressions for QoS measurements and requiring simulations to accurately provide QoS expectations for staffing schedules. Simulations are computationally demanding and reliable optimization procedures cannot meet the time demands of practical use. We develop a machine learning framework to approximate QoS expectations by predicting simulation outcomes, allowing us to quickly produce a look-up table of QoS for all candidate schedules. The QoS approximations are accurate to within 1-2 percent of the simulation results, even when the call center is considerably large. We then implement a simple deterministic optimization procedure to obtain schedules that can satisfy QoS targets at low costs. Using numerical experiments, we show that under reasonable time constraints our method improves upon the best schedule obtained via the Erlang-C model by 3.8% for the single-skill setting, and improves upon the best schedule obtained via simulation optimization by 4.3% for the multi-skill setting.

Keywords: Contact center scheduling, simulation, optimization, machine learning, service operations

Presented at: INFORMS International Conference on Service Science 2018 (Phoenix, AZ).

Multi-channel Conversion Attribution: A Machine Learning Approach (working paper), Paper Link

Working paper, with Piet Peeperkorn and Maarten Soomer

Abstract: With the increasing prominence of digital marketing, the need for accurate and robust methods to measure the effect and value of digital marketing actions has become a great priority, especially when several channels are affecting simultaneously. With online tracking of customers, it is now possible to map out individual customer journeys, and a number of rule-based and data-driven models have been developed recently to address the “attribution” problem, namely the assignment of purchase or conversion credit to the marketing channels that guided the customer to conversion. Even though some of the existing models have been widely adopted by practitioners, they often suffer from a lower predictive power in practice and cannot adequately explain or justify the credit shares they assign to different marketing channels. In this paper we present a novel machine learning approach to the problem of attributing conversion credit. By incorporating customer behavior information that is highly effective in predicting whether a customer journey will result in a conversion, this approach achieves conversion prediction quality that significantly exceeds existing attribution models. Conversion credits are then assigned to different marketing channels based on their associations with the predictability in conversion. Finally, we test this method on three real-life datasets and compare its conversion prediction and attribution outcomes to four existing attribution models.

Keywords: Marketing, e-commerce, machine learning

Target journal: INFORMS Marketing Science

Revenue Management for Parking with Advanced Reservations

Working paper, with Ruben van de Geer and Arnoud den Boer

Summary: We develop a data-driven solution to optimize the pricing and blocking policy of advance reservations for a smart parking technology company. This problem differs from a standard revenue management problem due to unknown and variable times of arrival and lengths-of-stay, so formulating a dynamic programming model would thus be infeasible. We decouple the pricing and blocking policies and approach this problem in two stages. First, we construct an optimal blocking policy by using machine learning trained with historical transactions to predict the optimal time of blocking open parking spaces for expected reservation arrivals. This allows us to minimize the potential loss in revenue of guaranteeing the reservation, while also providing a lower bound for the price. Subsequently we use a choice model and randomized price experiments to estimate the demand function for advanced reservations. Finally we use a second machine learning model to predict the expected future revenue as a function of accepting a reservation request, which in combination with the estimated demand function allows us to optimally price parking reservations in real time.

Keywords: Revenue management, dynamic pricing, machine learning

Target journal: INFORMS Management Science

Presented at: INFORMS Revenue Management and Pricing Conference 2018 (Toronto, Canada).

Improving Display Advertising With Predictive Device Matching

Working paper, with Taco Wijnsma

Abstract: Retargeting is a highly effective strategy of targeting display advertisements to potential online customers who have already visited the advertiser's website. Controlled field experiments have estimated that retargeting campaigns can increase an online retailer's website visits and purchases by over 17% and 10% respectively. Unfortunately, retargeting campaigns are limited in volume due to fragmentation of user information from poor online tracking. In this paper we develop a machine learning framework to probabilistically match HTTP cookies to users, thereby solving the fragmented user problem and increasing the volume of retargeting advertisements that can be served by as much as 14.3%.

Keywords: Operations-marketing interface, display advertising, machine learning

Presented at: POMS-HK Conference 2018 (Hong Kong), Amsterdam Business School Marketing Brownbag Series (Amsterdam, Netherlands).

Research in Progress

  • Optimizing Long-term Job Matching for an Online Marketplace, with Ashish Kabra

  • Data-driven Fatigue Management for Multiple Sclerosis Patients

  • Dynamic Optimization of Email Promotional Campaigns

  • Social Media Bot Detection with Machine Learning, with Juan Echeverria

Peer-reviewed Publications

  • Li, S., Wang, Q., and Koole, G., Predicting Call Center Performance with Machine Learning, Proceedings of the INFORMS International Conference on Service Science, 2018.

  • Puterman, M. L. and Wang, Q., Optimal Design of the PGA Tour; Relegation and Promotion in Golf, Proceedings of MIT Sloan Sports Analytics Conference, 2011.

  • Puterman, M. L. and Wang, Q., Optimal Dynamic Clustering Through Relegation and Promotion: How to Design a Competitive Sports League, Quantitative Analysis in Sports, 7, issue 2, Article 7, 2010.