Yunsong Guo

PhD
Department of Computer Science
Cornell
University
Ithaca, NY 14853
I was a PhD
student at Cornell University from 2005-2010. My research focus is in Machine Learning and
Combinatorial
Optimization.
During my time at Cornell, I have been blessed to have an amazing thesis committee, including Professor Carla Gomes, Thorsten Joachims, David Williamson, Bart Selman and Dexter Kozen
Here is my PhD dissertation titled Supervised Learning with Implicit Preferences and Constraints.
I graduated in August 2010 and joined Goldman Sachs thereafter to work in the inspiring world of Algorithmic Trading.
A resume (to be updated).
List of Refereed Conference
and Journal Publications
18. Yunsong Guo, Carla Gomes, Learning optimal subsets with implicit user preferences, The 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009; a preliminary version also appeared in NIPS08 workshop on structured input and structured output.
[-] We study the problem of learning an optimal subset from a larger
ground set of items, where the optimality criterion is defined by an
unknown preference function. We model the problem as a
discriminative structural learning problem and solve it using a
Structural Support Vector Machine (SSVM) that optimizes a "set
accuracy" performance measure representing set similarities. Our
approach departs from previous approaches since we do not explicitly
learn a pre-defined preference function. Experimental results on
both a synthetic problem domain and a real-world face image subset
selection problem show that our method significantly outperforms
previous learning approaches for such problems.
17. Yunsong Guo, Carla Gomes, Ranking structured documents: a large margin based approach for patent prior art search , The 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009
[-] We propose an approach for automatically ranking structured
documents applied to patent prior art search. Our model, SVM Patent
Ranking (SVM_PR) incorporates margin constraints that directly
capture the specificities of patent citation ranking. Our approach
combines patent domain knowledge features with meta-score features
from several different general Information Retrieval methods. The
training algorithm is an extension of the Pegasos algorithm with
performance guarantees, effectively handling hundreds of thousands
of patent-pair judgements in a high dimensional feature space.
Experiments on a homogeneous essential wireless patent dataset show
that SVM_PR performs on average 30%-40% better than many other
state-of-the-art general-purpose Information Retrieval methods in
terms of the NDCG measure at different cut-off positions.
16.
Nam Nguyen, Yunsong Guo,
Metric learning: a support vector
machine approach, The 18th European Conference on Machine
Learning (ECML/PKDD), 2008.
[-] In this paper, we address the metric learning problem utilizing
a margin-based approach. Our metric learning problem is formulated
as a quadratic semi-definite programming problem (QSDP) with
local neighborhood constraints, which is based on the Support Vector
Machine (SVM) framework. The local neighborhood constraints ensure
that examples of the same class are separated from examples of dierent
classes by a margin. In addition to providing an efficient algorithm to
solve the metric learning problem, extensive experiments on various data
sets show that our algorithm is able to produce a new distance metric to
improve the performance of the classical K-nearest neighbor (KNN) algorithm
on the classication task. Our performance is always competitive
and often significantly better than other state-of-the-art metric learning
algorithms.
15. Yunsong Guo,
Yanzhi Li, Andrew Lim, Brian Rodrigues, Tariff
concessions in production sourcing, European
Journal of Operational Research (EJOR), 187(2) , 2008.
[-] In this paper, we study a multi-stage production sourcing problem where tariff concessions can be exploited at the firm level using free trade agreements between countries. To
solve the problem, an algorithm which embeds a very large-scale neighborhood (VSLN)
search into a simulated annealing framework is developed. A numerical study is conducted
to verify the effectiveness of the solution approach.
14. Yunsong Guo, Bart
Selman,
ExOpaque:
a framework
to explain opaque machine learning models using inductive logic
programming, The 19th IEEE
international Conference on Tools with Artificial Intelligence
(ICTAI), 2007.
[-] In this paper we developed an Inductive Logic Programming
(ILP) based framework ExOpaque that is able
to extract a set of Horn clauses from an arbitrary opaque
machine learning model, to describe the behavior of the
opaque model with high fidelity while maintaining the simplicity
of the Horn clauses for human interpretations. In
addition, traditional ILP systems often utilize a set of background
knowledge to make accurate predictions. We also
proposed a new method to use generated artificial training
examples and high-accuracy opaque models to boost
the prediction accuracy of an ILP system when background
knowledge is absent and hard to obtain.
13.
Nam
Nguyen, Yunsong Guo,
Comparison of sequence labeling algorithms and
extensions, The 24th International Conference on Machine Learning (ICML), 2007.
[-] In this paper, we survey the current state-ofart
models for structured learning problems, including
Hidden Markov Model (HMM), Conditional
Random Fields (CRF), Averaged Perceptron
(AP), Structured SVMs (SVMstruct),
Max Margin Markov Networks (M3N), and an
integration of search and learning algorithm
(SEARN). With all due tuning efforts of various
parameters of each model, on the data sets
we have applied the models to, we found that
SVMstruct enjoys better performance compared
with the others. In addition, we also propose a
new method which we call the Structured Learning
Ensemble (SLE) to combine these structured
learning models. Empirical results show that our
SLE algorithm provides more accurate solutions
compared with the best results of the individual
models.
12. Yunsong Guo,
Andrew Lim, Brian Rodrigues, Shanhu Yu, Machine scheduling performance with maintenance
and failure, Mathematical And Computer Modelling (MCM), 45(9-10), 2007.
[-] In manufacturing control, machine scheduling research has mostly dealt with problems either without maintenance or with deterministic maintenance when no failure can occur. This can be unrealistic in practical settings. In this work, an experimental model is developed to evaluate the effect of corrective and preventive maintenance schemes on scheduling performance in the presence machine failure where the scheduling objective is to minimize schedule duration. We show that neither scheme is clearly superior, but that the applicability of each depends on several system parameters as well as the scheduling environment itself. Further, we show that parameter values can be chosen for which preventive maintenance does better than corrective maintenance. The results provided in this study can be useful to practitioners and to system or machine administrators in manufacturing and elsewhere.
11. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Liang Yang, Minimizing the makespan for unrelated
parallel
machines, International Journal on Artificial Intelligence Tools (IJAIT), 16(3), 2007.
[-] In this paper, we study the unrelated parallel machine problem for minimizing the
makespan, which is NP-hard. We used Simulated Annealing (SA) and Tabu Search (TS)
with Neighborhood Search (NS) based on the structure of the problem. We also used a
modied SA algorithm, which gives better results than the traditional SA and developed
an eective heuristic for the problem: Squeaky Wheel Optimization (SWO) hybrid with
TS. Experimental results average 2.52% from the lower bound and are within acceptable
timescales improving current best results for the problem.
10. Yunsong Guo,
Andrew Lim, Brian Rodrigues, Jiqing
Tang, Using a lagrangian heuristic for a combinatorial auction problem, International
Journal on Artificial Intelligence Tools (IJAIT), 15(3), 2006.
[-] Combinatorial auctions allow bidders to bid for items leading to more efficient allocations,
but determining winners in auctions is NP-complete. In this work, a simple yet e®ective
Lagrangian relaxation based heuristic algorithm is presented. Extensive computational
experiments using standard benchmark data (CATS) as well as newly generated more
realistic test sets were conducted which showed the heuristic was able to provide optimal
solutions for most test cases and is within 1% from the optimums for the rest within
very short times. Experiements comparing CPLEX 8.0, the fastest current algorithm,
showed the heuristic was able to provide equally godd or better solutions often requring
less than 1% of the time required by CPLEX 8.0.
9. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, Carrier assignment models in transportation procurement, Journal of the Operational Research Society
(JORS), 57, 2006.
[-] This paper extends carrier assignment models used in winner determination auctions
for transportation procurement to include shipper non-price objectives and carrier
transit point costs. The models are unlike traditional carrier assignment models
which incorporate only carrier lane bids, and different from combinatorial auction
models which focus on packets of lanes without considering transit point costs. We
develop solutions, including metaheuristics, for the new models and through computational
experimentation show that the algorithms work well and can be easily
implemented.
8. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, Heuristics for a bidding problem, Computers &
Operations Research (COR) , 33(8), 2006.
[-] In this paper we study a bidding problem which can be modelled as a set packing
problem. A simulated annealing heuristic with three local moves, including an embed-
ded branch-and-bound move, is developed for the problem. We compared the heuristic
with the CPLEX 8.0 solver and the current best non-exact method, Casanova, using
the standard CATS benchmark and other realistic test sets. Results show that the
heuristic outperforms CPLEX and Casanova.
7. Ping Chen, Yunsong Guo,
Andrew Lim, and Brian Rodrigues, Multiple crossdocks with inventory and time windows, Computers & Operations Research (COR),
33(1), 2006.
[-] Crossdocking studies have mostly been concerned with the physical layout of a crossdock or on a single crossdock. In this work, we study a network of crossdocks
taking into consideration delivery and pickup time windows, warehouse capacities
and inventory handling costs. Because of the complexity of the problem, local
search techniques are developed and used with simulated annealing and tabu search
heuristics. Extensive experiments were conducted and results show the heuristics
outperform CPLEX, providing solutions in realistic timescales.
6. Yunsong Guo,
Yanzhi Li, Andrew Lim, Brian Rodrigues, A
multi-exchange heuristic to production
line design under free trade agreement, The 18th Australian
Joint Conference on Artificial Intelligence(AUAI), 2005; also appeared in Lecture Notes in Computer Science, Editors S. Zhang and R. Jarvis, Springer-Verlag, Heidelberg, Volume 3809 (2005), pp. 871 - 874
[-] Production line design(PLD) can be viewed as a facility location problem in global supply chain.
However, New cost factors such as tari® are introduced under free trade agreements which appeared
about ten years ago and is becoming pervasive in global trading. A multinational manufacturer has
factories all over the world to produce a product. When a demand arises from a new market, the
manufacturer can fully utilize the existing resources and configure a most cost-e®ective production
line to satisfy the new demand. Such a problem turns out to be NP-hard. We provide an integer
programming model, which is hard to solve optimally while able to provide a lower bound. We then
propose a multi-exchange heuristic embedded in simulated annealing to tackle the problem. Extensive
experiments are conducted and comparisons are made with the lower bound and show that our algorithm
performs very well.
5. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Jiqing Tang, Using a lagrangian heuristic for
a combinatorial auction problem, The 17th International Conference on
Tools with Artificial Intelligence (ICTAI),
2005.
[-] In this paper, a combinatorial auction problem is modeled
as a NP-complete set packing problem and a Lagrangian
relaxation based heuristic algorithm is proposed.
Extensive experiments are conducted using benchmark
CATS test sets and more complex test sets. The algorithm
provides optimal solutions for most test sets and is always
1% from the optimal solutions for all CATS test sets. Comparisons
with CPLEX 8.0 are also provided, which show
that the algorithm provides good solutions.
4. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Yi Zhu, A non-exact approach and experiment studies on the
combinatorial auction problem, The 38th
Hawaii
International Conference on System Sciences (HICSS), 2005.
[-] In this paper we formulate a combinatorial auction
brokering problem as a set packing problem and apply a
simulated annealing heuristic with hybrid local moves to
solve the problem. We study the existing exact and nonexact
approaches to the problem and analyze the performance
of those approaches. We compared our heuristic
with the leading exact method CPLEX 8.0 solver
and another non-exact algorithms Casanova using both
the CATS test sets and test cases believed more difficult
than CATS. Results show that the method is competitive
with CPLEX 8.0 and obtains near optimal solutions
for the CATS cases and up to 15% and 40% better
solutions compared with CPLEX and Casanova, respectively,
when the other instances were used.
3. Yunsong Guo,
Andrew Lim, Brian Rodrigues and Shanhu Yu, Minimizing total flow time in single machine
environment with release time: an experimental analysis, Computers
& Industrial Engineering (CIE), 47(2-3), 2004.
[-] We study the problem of minimizing total flow time on a single machine with job release times. This problem is
NP-complete for which is no constant ratio approximation algorithm. Our objective is to study experimentally how
well, on average, the problem can be solved. The algorithm we use produces non-preemptive schedules converted
from preemptive ones. We evaluate average solution quality for the problem to identify the characteristics of
difficult instances. Results obtained are compared with those recently obtained by other researchers. Based on
extensive experiments, we also develop an empirical model to predict solution quality using interpolation.
2. Yunsong Guo,
Andrew Lim, Brian Rodrigues
and Yi Zhu, Heuristics for a brokering set packing problem, The 8th
International Symposium on Artificial Intelligence and Mathematics
(AMAI),
2004.
[-] In this paper we formulate the combinatorial auction brokering problem as a Set Packing
Problem (SPP) and apply a simulated annealing heuristic to solve SPP. Experimental
results are compared with a recent heuristic algorithm and also with the CPLEX Integer
Programming solver where we found that the simulated annealing approach outperforms
the previous heuristic method and CPLEX obtaining results within smaller timescales.
1. Yunsong Guo,
Andrew Lim, Brian Rodrigues
and Yi Zhu, Transportation bid analysis optimization with shipper
input, The 15th International Conference on Tools with Artificial
Intelligence (ICTAI), 2003.
[-] This paper extends carrier assignment models used in bid analysis for transportation procurement to
incorporate shipper business considerations. These include restricting carrier numbers, favoring incumbents
and performance considerations. We provide representative models and develop solutions for these
which include the use of metaheuristics. Experimentation shows that our algorithms work well.
Updated Sep-09
