Introduction In this we address the problem of generic and extractive summarization from collections of related task commonly known as summarization.
We treat this task as monotone function maximization be in Section 2).
This has number of critical bene?ts.
On the one there exists simple greedy algorithm for monotone function maximization where the summary solution obtained is guaranteed to be almost as good as the best possible solution according to an objective F. More the greedy algorithm is constant factor approximation to the cardinality constrained version of the so that 0:632F(Sopt).
This is particularly attractive since the quality of the solution does not depend on the size of the so even very large size problems do well.
It is also important to note that this is worst case and in most cases the quality of the solution obtained will be much better than this bound suggests.
Of none of this is useful if the objective function is inappropriate for the summarization task.
In this we argue that monotone nondecreasing functions are an ideal class of functions to investigate for document summarization.
We in that many methods for summarization and and and et al., and correspond to function property not explicitly mentioned in these publications.
We take this as testament to the value of functions for if summarization algorithms are repeatedly developed by happen to be an instance of function this suggests that functions are natural ?t.
On the other other authors have started realizing explicitly the value of functions for summarization and et al., 2010).
functions share many properties in common with convex one of which is that they are closed under number of common combination operations certain and so on).
These operations give us the tools necessary to design powerful objective for document summarization that extends beyond any previous work.
We demonstrate this by carefully crafting class of we feel are ideal for extractive summarization both generic and query-focused.
In doing we demonstrate better than existing performance on number of standard summarization evaluation namely through to DUC07.
We believe our might act as springboard for researchers in summarization to consider the problem of to design for the summarization task.
In Section we provide brief background on functions and their optimization.
Section describes how the task of extractive summarization can be viewed as problem of function maximization.
We also in this section show that many standard methods for summarization in already performing function optimization.
In Section we present our own functions.
Section presents results on both generic and summarization showing as far as we know the best known ROUGE results for through and the best known precision results for and the best recall results among those that do not use web search engine.
Section discusses implications for future work.
Background on We are given set of objects and function that returns real value for any subset. We are interested in the subset of bounded size that maximizes the e.g., F(S).
In this operation is hopelessly an unfortunate fact since the optimization coincides with many important applications.
For might correspond to the value or coverage of set of sensor locations in an and the goal is to the best locations for number of sensors et al., 2008).
If the function is monotone then the maximization is still but it was shown in et al., that greedy algorithm an approximate solution guaranteed to be within of the optimal as mentioned in Section 1.
version of this algorithm scales to very large data sets.
functions are those that satisfy the property of diminishing for any function must satisfy F(B).
That the incremental of decreases as the context in which is considered grows from to B. An equivalent useful is that for any we must have that B).
If this is everywhere with then the function is called and in such case for sized vector of real values and constant c.
set function is monotone nondecreasing if F(B).
As in this monotone nondecreasing functions will simply be referred to as monotone submodular.
functions have their roots in game combinatorial and operations research.
More functions have started receiving attention in the machine learning and computer vision community et al., and Krause and and Krause et al., Kolmogorov and and have recently been introduced to natural language processing for the tasks of document summarization and and word alignment and 2011).
functions share number of properties in common with convex and concave functions including their wide their their multiple options for their and their closure under number of common operators and certain convolutions).
For if collection of functions is then so is their weighted sum Pi where are nonnegative weights.
It is not hard to show that functions also have the following composition property with concave Theorem 1.
Given functions and the composition (i.e.
, is nondecreasing if is concave and is nondecreasing submodular.
This property will be quite useful when functions for document summarization.3 in Summarization 3.1 Summarization with knapsack constraint Let the ground set represents all the sentences other linguistic in document document in the summarization case).
The task of extractive document summarization is to select subset to represent the entirety set ).
There are typically constraints on however.
we should have as it is summary and should be small.
In standard summarization tasks (e.g.
, the summary is usually required to be length-limited.
constraints on can naturally be modeled as knapsack where is the cost of selecting unit (e.g.
, the number of words in the and is our budget.
If we use set function to measure the quality of the summary set the summarization problem can then be formalized as the following combinatorial optimization Problem 1.
Find subject Since this is generalization of the cardinality constraint this also constitutes problem.
In this case as greedy algorithm with partial enumeration can solve Problem with factor if is monotone 2004).
