Department of Computer Science
Ithaca, NY 14853
I was a PhD
student at Cornell University from 2005-2010. My research focus is in Machine Learning and
During my time at Cornell, I have been blessed to have an amazing thesis committee, including Professor Carla Gomes, Thorsten Joachims, David Williamson, Bart Selman and Dexter Kozen
Here is my PhD dissertation titled Supervised Learning with Implicit Preferences and Constraints.
I graduated in August 2010 and joined Goldman Sachs thereafter to work in the inspiring world of Algorithmic Trading.
A resume (to be updated).
List of Refereed Conference and Journal Publications
18. Yunsong Guo, Carla Gomes, Learning optimal subsets with implicit user preferences, The 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009; a preliminary version also appeared in NIPS08 workshop on structured input and structured output.
We study the problem of learning an optimal subset from a larger ground set of items, where the optimality criterion is defined by an unknown preference function. We model the problem as a discriminative structural learning problem and solve it using a Structural Support Vector Machine (SSVM) that optimizes a "set accuracy" performance measure representing set similarities. Our approach departs from previous approaches since we do not explicitly learn a pre-defined preference function. Experimental results on both a synthetic problem domain and a real-world face image subset selection problem show that our method significantly outperforms previous learning approaches for such problems.
17. Yunsong Guo, Carla Gomes, Ranking structured documents: a large margin based approach for patent prior art search , The 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009
We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVM_PR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVM_PR performs on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.
Nam Nguyen, Yunsong Guo,
Metric learning: a support vector
machine approach, The 18th European Conference on Machine
Learning (ECML/PKDD), 2008.
In this paper, we address the metric learning problem utilizing a margin-based approach. Our metric learning problem is formulated as a quadratic semi-definite programming problem (QSDP) with local neighborhood constraints, which is based on the Support Vector Machine (SVM) framework. The local neighborhood constraints ensure that examples of the same class are separated from examples of dierent classes by a margin. In addition to providing an efficient algorithm to solve the metric learning problem, extensive experiments on various data sets show that our algorithm is able to produce a new distance metric to improve the performance of the classical K-nearest neighbor (KNN) algorithm on the classication task. Our performance is always competitive and often significantly better than other state-of-the-art metric learning algorithms.
15. Yunsong Guo,
Yanzhi Li, Andrew Lim, Brian Rodrigues, Tariff
concessions in production sourcing, European
Journal of Operational Research (EJOR), 187(2) , 2008.
In this paper, we study a multi-stage production sourcing problem where tariff concessions can be exploited at the firm level using free trade agreements between countries. To solve the problem, an algorithm which embeds a very large-scale neighborhood (VSLN) search into a simulated annealing framework is developed. A numerical study is conducted to verify the effectiveness of the solution approach.
14. Yunsong Guo, Bart
to explain opaque machine learning models using inductive logic
programming, The 19th IEEE
international Conference on Tools with Artificial Intelligence
In this paper we developed an Inductive Logic Programming (ILP) based framework ExOpaque that is able to extract a set of Horn clauses from an arbitrary opaque machine learning model, to describe the behavior of the opaque model with high fidelity while maintaining the simplicity of the Horn clauses for human interpretations. In addition, traditional ILP systems often utilize a set of background knowledge to make accurate predictions. We also proposed a new method to use generated artificial training examples and high-accuracy opaque models to boost the prediction accuracy of an ILP system when background knowledge is absent and hard to obtain.
In this paper, we survey the current state-ofart models for structured learning problems, including Hidden Markov Model (HMM), Conditional Random Fields (CRF), Averaged Perceptron (AP), Structured SVMs (SVMstruct), Max Margin Markov Networks (M3N), and an integration of search and learning algorithm (SEARN). With all due tuning efforts of various parameters of each model, on the data sets we have applied the models to, we found that SVMstruct enjoys better performance compared with the others. In addition, we also propose a new method which we call the Structured Learning Ensemble (SLE) to combine these structured learning models. Empirical results show that our SLE algorithm provides more accurate solutions compared with the best results of the individual models.
12. Yunsong Guo,
Andrew Lim, Brian Rodrigues, Shanhu Yu, Machine scheduling performance with maintenance
and failure, Mathematical And Computer Modelling (MCM), 45(9-10), 2007.
In manufacturing control, machine scheduling research has mostly dealt with problems either without maintenance or with deterministic maintenance when no failure can occur. This can be unrealistic in practical settings. In this work, an experimental model is developed to evaluate the effect of corrective and preventive maintenance schemes on scheduling performance in the presence machine failure where the scheduling objective is to minimize schedule duration. We show that neither scheme is clearly superior, but that the applicability of each depends on several system parameters as well as the scheduling environment itself. Further, we show that parameter values can be chosen for which preventive maintenance does better than corrective maintenance. The results provided in this study can be useful to practitioners and to system or machine administrators in manufacturing and elsewhere.
11. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Liang Yang, Minimizing the makespan for unrelated
machines, International Journal on Artificial Intelligence Tools (IJAIT), 16(3), 2007.
In this paper, we study the unrelated parallel machine problem for minimizing the makespan, which is NP-hard. We used Simulated Annealing (SA) and Tabu Search (TS) with Neighborhood Search (NS) based on the structure of the problem. We also used a modied SA algorithm, which gives better results than the traditional SA and developed an eective heuristic for the problem: Squeaky Wheel Optimization (SWO) hybrid with TS. Experimental results average 2.52% from the lower bound and are within acceptable timescales improving current best results for the problem.
10. Yunsong Guo,
Andrew Lim, Brian Rodrigues, Jiqing
Tang, Using a lagrangian heuristic for a combinatorial auction problem, International
Journal on Artificial Intelligence Tools (IJAIT), 15(3), 2006.
Combinatorial auctions allow bidders to bid for items leading to more efficient allocations, but determining winners in auctions is NP-complete. In this work, a simple yet e®ective Lagrangian relaxation based heuristic algorithm is presented. Extensive computational experiments using standard benchmark data (CATS) as well as newly generated more realistic test sets were conducted which showed the heuristic was able to provide optimal solutions for most test cases and is within 1% from the optimums for the rest within very short times. Experiements comparing CPLEX 8.0, the fastest current algorithm, showed the heuristic was able to provide equally godd or better solutions often requring less than 1% of the time required by CPLEX 8.0.
9. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, Carrier assignment models in transportation procurement, Journal of the Operational Research Society
(JORS), 57, 2006.
This paper extends carrier assignment models used in winner determination auctions for transportation procurement to include shipper non-price objectives and carrier transit point costs. The models are unlike traditional carrier assignment models which incorporate only carrier lane bids, and different from combinatorial auction models which focus on packets of lanes without considering transit point costs. We develop solutions, including metaheuristics, for the new models and through computational experimentation show that the algorithms work well and can be easily implemented.
8. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, Heuristics for a bidding problem, Computers &
Operations Research (COR) , 33(8), 2006.
In this paper we study a bidding problem which can be modelled as a set packing problem. A simulated annealing heuristic with three local moves, including an embed- ded branch-and-bound move, is developed for the problem. We compared the heuristic with the CPLEX 8.0 solver and the current best non-exact method, Casanova, using the standard CATS benchmark and other realistic test sets. Results show that the heuristic outperforms CPLEX and Casanova.
7. Ping Chen, Yunsong Guo,
Andrew Lim, and Brian Rodrigues, Multiple crossdocks with inventory and time windows, Computers & Operations Research (COR),
Crossdocking studies have mostly been concerned with the physical layout of a crossdock or on a single crossdock. In this work, we study a network of crossdocks taking into consideration delivery and pickup time windows, warehouse capacities and inventory handling costs. Because of the complexity of the problem, local search techniques are developed and used with simulated annealing and tabu search heuristics. Extensive experiments were conducted and results show the heuristics outperform CPLEX, providing solutions in realistic timescales.
6. Yunsong Guo,
Yanzhi Li, Andrew Lim, Brian Rodrigues, A
multi-exchange heuristic to production
line design under free trade agreement, The 18th Australian
Joint Conference on Artificial Intelligence(AUAI), 2005; also appeared in Lecture Notes in Computer Science, Editors S. Zhang and R. Jarvis, Springer-Verlag, Heidelberg, Volume 3809 (2005), pp. 871 - 874
Production line design(PLD) can be viewed as a facility location problem in global supply chain. However, New cost factors such as tari® are introduced under free trade agreements which appeared about ten years ago and is becoming pervasive in global trading. A multinational manufacturer has factories all over the world to produce a product. When a demand arises from a new market, the manufacturer can fully utilize the existing resources and configure a most cost-e®ective production line to satisfy the new demand. Such a problem turns out to be NP-hard. We provide an integer programming model, which is hard to solve optimally while able to provide a lower bound. We then propose a multi-exchange heuristic embedded in simulated annealing to tackle the problem. Extensive experiments are conducted and comparisons are made with the lower bound and show that our algorithm performs very well.
5. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Jiqing Tang, Using a lagrangian heuristic for
a combinatorial auction problem, The 17th International Conference on
Tools with Artificial Intelligence (ICTAI),
In this paper, a combinatorial auction problem is modeled as a NP-complete set packing problem and a Lagrangian relaxation based heuristic algorithm is proposed. Extensive experiments are conducted using benchmark CATS test sets and more complex test sets. The algorithm provides optimal solutions for most test sets and is always 1% from the optimal solutions for all CATS test sets. Comparisons with CPLEX 8.0 are also provided, which show that the algorithm provides good solutions.
4. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, A non-exact approach and experiment studies on the
combinatorial auction problem, The 38th
In this paper we formulate a combinatorial auction brokering problem as a set packing problem and apply a simulated annealing heuristic with hybrid local moves to solve the problem. We study the existing exact and nonexact approaches to the problem and analyze the performance of those approaches. We compared our heuristic with the leading exact method CPLEX 8.0 solver and another non-exact algorithms Casanova using both the CATS test sets and test cases believed more difficult than CATS. Results show that the method is competitive with CPLEX 8.0 and obtains near optimal solutions for the CATS cases and up to 15% and 40% better solutions compared with CPLEX and Casanova, respectively, when the other instances were used.
3. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Shanhu Yu, Minimizing total flow time in single machine
environment with release time: an experimental analysis, Computers
& Industrial Engineering (CIE), 47(2-3), 2004.
We study the problem of minimizing total flow time on a single machine with job release times. This problem is NP-complete for which is no constant ratio approximation algorithm. Our objective is to study experimentally how well, on average, the problem can be solved. The algorithm we use produces non-preemptive schedules converted from preemptive ones. We evaluate average solution quality for the problem to identify the characteristics of difficult instances. Results obtained are compared with those recently obtained by other researchers. Based on extensive experiments, we also develop an empirical model to predict solution quality using interpolation.
2. Yunsong Guo,
Andrew Lim, Brian Rodrigues
and Yi Zhu, Heuristics for a brokering set packing problem, The 8th
International Symposium on Artificial Intelligence and Mathematics
In this paper we formulate the combinatorial auction brokering problem as a Set Packing Problem (SPP) and apply a simulated annealing heuristic to solve SPP. Experimental results are compared with a recent heuristic algorithm and also with the CPLEX Integer Programming solver where we found that the simulated annealing approach outperforms the previous heuristic method and CPLEX obtaining results within smaller timescales.
1. Yunsong Guo,
Andrew Lim, Brian Rodrigues
and Yi Zhu, Transportation bid analysis optimization with shipper
input, The 15th International Conference on Tools with Artificial
Intelligence (ICTAI), 2003.
This paper extends carrier assignment models used in bid analysis for transportation procurement to incorporate shipper business considerations. These include restricting carrier numbers, favoring incumbents and performance considerations. We provide representative models and develop solutions for these which include the use of metaheuristics. Experimentation shows that our algorithms work well.