F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 1 of 45





A THEORY OF JET DEFINITION


Fyodor V. Tkachov

Institute for Nuclear Research
of Russian Academy of Sciences
Moscow 117312 Russia





A systematic framework for jet definition is developed from first principles of physical
measurement, quantum field theory, and QCD.
A jet definition is found which:

 is theoretically optimal in regard of both minimization of detector errors and inversion of hadroniza-
tion;
 is similar to a cone algorithm with dynamically negotiated jet shapes and positions found via shape
observables that generalize the thrust to any number of axes;
 involves no ad hoc conventions;
 allows a fast computer implementation .

The framework offers an array of options for systematic construction of quasi-optimal observables for
specific applications.



The second edition:

 clarifies, expands and solidifies the arguments behind the formal derivation;
 strengthens consistency of the derivation at the last step fine-tunes the final criterion
the jet search algorithm is now much simpler, faster and robust [7];
 uncovers new options not available in conventional schemes.

Eliminated:

 The algorithmically cumbersome linear restriction on missing energy; now treated additively with a
cumulative upper bound (6.21).

Added:
 a general theory of optimal observables (2.72.35);
 an analysis of inversion of hadronization (5.10);
 the relation to cone algorithms and thrust (8.11, 8.14);
arXiv: v5 10 Jan 2000  a model-independent tool to quantify hadronization (the soft distribution; 8.19);
 the option of multiple jet configurations (9).



Related materials (code etc.) are available at http://www.inr.ac.ru/~ftkachov/projects/jets/index.htm


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 2 of 45



Introduction 1 Numbering and formatting 1.1

The two-level numbering conventions of [4] are adopted to
Jet finding algorithms are a key tool in high-energy physics facilitate searches of cross-referenced items: sub(sub)sections,
[1], and the problem of quantitative description of the structure equations, figures, tables, and textual propositions are num-
of multi-hadron final states remains at the focus of physicists' bered consecutively within sections. Section headings are set in
attention (cf. e.g. [2]). bold type.
This paper continues the systematic investigation of quan- Sub- and subsubsection headings differ only by formatting
titative description of multijet structure from first principles of (solid and dotted underlining, respectively).
physical measurements and quantum field theory undertaken in
[3][5]. Our purpose here is to complete the analysis of [4] Underlined italic type indicates an important term being de-
in regard of jet algorithms.a We are going to develop a system- fined. The underlining helps the eye to find definitions in the
atic theory of jet definition and derive a jet finding criterion -- body of the text. The meaning of such terms in the context of
the so-called optimal jet definition (OJD) -- summarized in our theory is usually narrowed compared with the conventional
Sec. 7.16. usage.

It is optimal in a well-defined sense -- the sense which is
ignored in the conventional deliberations about jet algorithms. Double boxes enclose conceptually important propositions,
Namely, the new principle on which the presented theory of jet which maybe numbered.
1.2
definition is based is that the configuration of jets must inherit
maximum physical information from the original event
(Sec. 5.6). Simple solid boxes contain formulas and propositions which are
Now the first difficulty (besides realizing its importance) is part of the optimal jet definition (OJD) and related algorithmic op-
to give that axiom a systematic quantitative form. This is what tions.
the first part of the present work (Sections 24) deals with. 1.3
In the second part (Sections 57) we derive OJD which is
summarized in Sec. 7.16.
The third part (Sections 811) investigates the definition. Dotted boxes denote important formulas and propositions.
1.4
A more detailed description of the content is given in
Sec. 1.5.  Bullets indicate further options, important asides, etc.
The focus in this paper is on the analytical theory of the
criterion and the underlying principles. Its software implemen- The reader is invited to begin to read this text by browsing
tation is discussed in a separate publication [7]. A detailed through it using boxes and bullets as visual clues.
numerical investigation of OJD requires a separate project.
Plan 1.5
Also beyond the scope of the present work are complete
formal proofs of the background propositions of Section 3 (this The paper can be roughly divided into three parts.
especially concerns the arguments in Sec. 4.1): the purpose The first part (Sections 24) is preparatory and devoted to a
here is to uncover and clearly formulate the assumptions in- clarification of some general issues pertaining to data process-
volved and to devise a formulaic way to talk about jet finding ing procedures of which jet algorithms are a part.
with hand-waving minimized. What may appear as fancy The second part (Sections 57) is devoted to the derivation
mathematical formulations is primarily intended as an invita- of the optimal jet definition.
tion to mathematical physicists to fill in the remaining gaps.
The third part (Sections 811) investigates the OJD.
Notations agree with [4] but the present paper is self-
contained in this respect. Section 2 is devoted to a clarification of the relevant issues
of mathematical statistics (this perhaps should have been done
The theoretical attitude which permeates this work is that already in [4]). The reasoning is general and practically no spe-
jets are not partons but a data processing tool motivated by the cifics of high-energy physics is invoked. We introduce the no-
partonic structure of QCD dynamics at high energies [4]. Such tion of (quasi-) optimal observables for measurements of fun-
a shift of emphasis allows one to remove artificial restrictions damental parameters such as
in the design of data processing algorithms. S , M W, etc. (Sec. 2.7). Such ob-
servables allow one in principle to reach the best possible pre-
In view of importance of the subject and the prevailing cision for fundamental parameters. The resulting practical pre-
prejudice that the definition of jets is a matter of subjective scriptions (Sec. 2.25) improve upon the usual signal-vs.-noise
preference (which, as the theory developed in the present paper considerations with an important new ingredient here being the
shows, may not be quite true), I prefer to tax the reader's pa- notion of regularization of discontinuities (Sec. 2.52). The pre-
tience by explicitly stating trivial things (and even putting scriptions of Section 2 motivate new ways for the use of jet al-
them in boxes) rather than leave an important axiom out of the gorithms some of which are described in Section 9.
picture. Section 3 is essentially a clarification of the arguments of
[4] in the light of the results of Section 1. It discusses the
a The 2 1 recombination version of the optimal jet definition was dis- "kinematical" properties of observables (their so-called C-
cussed in [6]; see Sec. 10.11 of the present paper. It is interesting to ob- continuity [4]) which ensure their optimal sensitivity to errors
serve how the popularity of recombination schemes (which of course is due and amenability to theoretical studies. C-continuity is de-
to their simplicity) led astray the study of jet algorithms within the frame-
work of [4] which per se provides no motivation for considering 2 1 re- scribed using a special distance among events (Sec. 3.23). The
combinations. This is not the first time that I realize, ex post facto exami- arguments here culminate in a quantitative description of the
nation, that the hardest part in the solving of seemingly intractable prob- event's physical information content (Sec. 3.42) which serves
lems is invariably to escape the psychological traps created by quasi-
solutions.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 3 of 45



as a formal starting point for a subsequent derivation of kine- Section 10 compares OJD with the conventional cone and
matical jet definition. recombination schemes. We discuss the vicious circle in the
Section 4 investigates the specific structure of QCD prob- conventional jet definitions (no principle to fix the initial cone
ability densities. The purpose is to clarify the logical connec- configuration/order of recombinations; 10.2). Also derived is a
tion between the notions of C-continuity and IR safety (the curious variant of OJD (Eq. 10.7) which corresponds to the
former turns out to be a non-perturbative reformulation of the original cone algorithm of [8] rewritten in terms of IR safe
latter). This solidifies the conjecture of [8] concerning the pos- shape observables. In Sec. 10.11 we discuss the connection of
sibility to confront perturbative calculations with hadronic data OJD with the conventional 2 1 recombination criteria.
for IR safe observables (cf. Eq. 4.2). Then a formal description A general conclusion is that the mechanism of OJD is rather
of hadronization is introduced (Eq. 4.12) to prepare ground for similar to the conventional cone algorithms.
a subsequent study of dynamical aspects of our jet definition. Section 11 summarizes our findings.
A formal construction of optimal observables which takes into
account the hadronization model (Eq. 4.20) provides a refer-
ence point for the constructions of observables based on jet al- Optimal observables, continuity
gorithms. The conventional scheme for that is discussed in and regularizations 2
Sec. 4.28.

Section 5 discusses jet definitions. First in Sec. 5.1 the im- For a meaningful discussion of jet algorithms it is essential
plicit conventional definition of an ideal jet algorithm is inves- to regard them as a special case of general data processing pro-
tigated. Then Sec. 5.6 introduces a definition of jets rooted in cedures. With that in mind, below are listed some basic facts of
the formalism of the preceding sections. We then exhibit its mathematical statistics which emerged as necessary for a sys-
connection with the concept of inversion of hadronization tematic clarification of the issue of jet definition. Although the
(Sec. 5.10). Then a quantitative version jet definition is de- high-energy physics background affected the terminology and
scribed (Sec. 5.17). It is based on inequalities of a factorized emphasis of the presentation below, it deals essentially with
form (Eq. 5.18) which estimate the loss of physical information elementary notions of parameter estimation. However, the im-
content in the transition from events to jet configurations. We portant prescription we arrive at in Sec. 2.25 seems to be
discuss how different jet algorithms can be compared on the missing from textbooks.
basis of how well they conserve the information (Sec. 5.26). Some generalities 2.1
Then a universal dynamics-agnostic variant of jet definition is
introduced (Sec. 5.27), and in Sec. 5.31 we explain how it can One deals with a random variable P whose instances
be modified to include dynamical information. (specific values) are called events. Throughout most of this
Section 6 is technical and devoted to the derivation of the section, the nature of events P can be anything: they can be
factorized estimate. The main trick is the so-called recombina- random points on the real axis or random measures on the unit
tion matrix (Sec. 6.1); finding the configuration of jets is sphere.
equivalent to finding that matrix. The matrix can be regarded One always deals with a finite collection of experimentally
as a cumulative variant of the entire sequence of 2 1 recom- observed events {P }
i i . In the context of applications of interest
binations in the conventional recombination jet finding scheme to us, events are obtained via rather complex measurement
(cf. also Sec. 10.11) but now all particles are, so to say, recom- procedures, so that their probability distribution (P) reflects
bined into jets democratically. (In this respect, OJD is equiva- experimental imperfections.
lent to a prescription for determining the order of recombina- Experimental imperfections are of two kinds to be called,
tions.) respectively, statistical errors which are due to the finite num-
In Section 7, the remaining ambiguities are fixed in such a ber of events in the event sample, and detector errors i.e. dis-
way as to ensure a maximal computational convenience, mo- tortions of individual events by measurement devices. Of
mentum conservation, and Lorentz covariance. We consider course, the two cannot be strictly separated because detector
both the spherical kinematics (c.m.s. annihilation of e+e- pairs errors may cause some events not to be seen at all but this is
into hadrons) and hadron collisions kinematics (a boost- not important for our purposes.
invariant formulation). Theory provides a model for (P) controlled by a small
Section 8 clarifies the mechanism of the obtained jet defi- number of fundamental parameters such as the Standard
nition and establishes its connection with shape observables of Model's S , M W, etc.
the conventional type (Sec. 8.11). Then we present simple Theoretical knowledge may also involve imperfections, e.g.
analytical arguments which show that OJD is essentially a cone the necessity to describe hadronic data in terms of quarks and
algorithm with dynamically determined positions and shapes of gluons in perturbative quantum chromodynamics (pQCD).
jet cones (Sec. 8.14).
Any data processing has, in the final respect, two purposes.
An important tool we obtain as a subproduct is the so-called
One is to test the hypothesis of correctness of the underlying
soft distribution (Sec. 8.19). It allows one to quantify the theoretical model, which we will not discuss. The other pur-
mechanism of hadronization in a model-independent fashion. pose is to extract the values of }
S , M W,... from given {Pi i
Section 9 considers the issues for a discussion of which the and (P).
conventional schemes offer no framework whatever, namely, This can be represented as follows:
the problem of non-uniqueness of jet configurations which in
the case of OJD takes the problem of multiple minima. The ( )
P UV

options naturally offered by the developed theory allow one to , M , K 2.2
{P } S W
W data processing algorithm
go beyond the restrictions of the conventional data processing i i
scheme based on jet algorithms (4.38).


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 4 of 45



It is convenient to call the collection of events {P }
i i It turns out that there is a general prescription to construct
raw physical information. On the other hand, to obtain the pa- such observables in a systematic fashion.
rameters on the r.h.s., one has to interpret data in terms of a
specific model, so such parameters are conveniently called Optimal observables 2.7
interpreted physical information.

The scheme 2.2 represents the much studied basic problem Suppose one needs to extract the value of the fundamental
of mathematical statistics [9], [10]. However, we would like to parameter M on which depends the probability distribution
regard it in the light of specifics of the formalism of quantum (P).b We are going to study ways to choose an observable f
field theory where a central role is played by quantum opera- so as to determine M to maximum precision. First we will ob-
tors whose average values over ensembles of events are the tain an ideal explicit formula for such an optimal observable
quantum observables. In the language of mathematical statis- (Eq. 2.17). The formula itself is essentially a translation of the
tics, this means that we are going to place emphasis on the method of maximum likelihood into the language of moments
generalized method of moments. but our derivation is somewhat unconventional and it allows us
to go further and study effects of small deviations from opti-
So we wish to consider the general scheme in which the mality (Eq. 2.22; it seems to be a new result), and then arrive at
transformation 2.2 is accomplished by choosing suitable func- a prescription for a systematic practical construction of quasi-
tions f (P) defined on events, and then finding the parameters optimal observables (Sec. 2.25). The prescription seems to be
by equating their theoretical average values, both important and newc.
f = f = d ( ) f
z P P (P) , 2.3
th In the context of precision measurements one can assume
where is supposed to be known so that this can be computed the magnitude of errors to be small. Under this assumption,
for any values of fundamental parameters, with the correspond- one can relate variations in the values of M with variations in
ing experimental values: the values of f as follows:
1 F -1
f = f
(P ) . 2.4 f
exp M = f
N i i
HG I , 2.8
M
KJ
The scheme 2.2 becomes:
where the derivative is applied only to the probability distribu-
( )
P
f U
observable f th tion (M is unknown, so even though the solution, f opt, will de-
V|
, M ,K 2.5
{P }
fit S W pend on M , any such dependence is coincidental and therefore
i i f
observable f exp W| "frozen" in this calculation):

In terms of mathematical statistics, the weight f is a gener- f P
( )
=
dP f P
( ) . 2.9
M z
alized moment. In the context of quantum field theory, to such M
functions there correspond quantum operators in terms of The axiom 2.6 translates into the requirement of minimizing
which the entire theory is formulated. We will be using the the expression 2.8 by an appropriate choice of f .
quantum-theoretic term observable for such functions, and call -1/2
f its observable value. Then f = N Var f , where:

The values of all possible observables f will be called Var f = P P b f ( )
P - f g2 f 2 - f
zd ( ) 2 . 2.10
processed physical information, which is a model-independent
concept to be contrasted with the model-dependent notion of In terms of variances, Eq. 2.8 becomes:
interpreted physical information (fundamental parameters). F -2
f
With processed physical information, one simply deals with Var M = Var f .
HG I
2.11
M KJ
all possible functions on events. Their general properties such
as continuity play an important role in the analysis of sensitiv- We want to minimize this by a suitable choice of f .
ity of observables to experimental and theoretical imperfec- A necessary condition for a minimum can be written in
tions. Such properties can be called kinematical because they terms of functional derivatives:d
depend only on the general structure of detector errors and of =
the underlying formalism (quantum field theory), and can be Var M 0 . 2.12
f ( )
P
studied in a model-independent manner (Section 3).
All conventional data processing procedures (involving Substitute Eq. 2.11 into 2.12 and use the following relations:
event selection, jet algorithms, histograms, etc.) are special
cases of the scheme 2.5. In practice the fits of theoretical pre-
dictions to experimental data often involve many observables
(e.g. each bin of a histogram represents one numeric-valued
observable). Such collections can be regarded simply as ob- b We assume that all mechanisms of distortion of observed events are in-
servables that take non-numeric values (in the simplest inter- cluded into the probability distribution (P). The problem of coping with
pretation, arrays, perhaps multidimensional; in a more sophis- insufficient knowledge of detector errors that distort individual events is
ticated interpretation, the values may be functional objects). discussed in Sec. 2.48.

For explicitness' sake, here is an obvious but key axiom: c New to the extent that I've seen no trace in the literature of its being
known to either theorists or experimentalists.

The best observables f (P) are those which yield the best preci- d An interesting mathematical exercise of casting the following reasoning
sion for fundamental parameters. (the functional derivatives, -functionals, etc.) into a rigorous form is left
2.6 to interested mathematical parties. For practical purposes it is sufficient to
note that the range of validity of the prescriptions we obtain is practically
the same as for the maximum likelihood method; see Sec. 2.32.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 5 of 45



 The first term on the r.h.s. of 2.22, f 2 -1 , is the absolute
f =
( )
P , f 2 = 2 f ( )
P opt
( )
P ,
f ( )
P f ( )
P minimum for the variance of as established by the funda-
f ( )
P mental Rao-Cramer inequality [9], [10]. The latter is valid for
=
. 2.13
f ( )
P M
M all and therefore is somewhat stronger than the result 2.22
which we have obtained only for sufficiently small . How-
After some simple algebra one obtains: ever, Eq. 2.22 gives a simple explicit estimate for the deviation
ln ( )
P from optimality and so makes possible the practical prescrip-
f ( )
P = f +
const , 2.14
M
tions of Sec. 2.25.
The quantity
where the constant is independent of P. The constant plays no
role since f is defined by this reasoning only up to a constant I f 2
=
opt opt 2.23
factor. Noticing that
is closely related to Fischer's information [9], [10].
ln P
( )
dP
P
( ) = dP P
z ( ) 1 = 0 , 2.15
More generally, it will be convenient to talk about infor-
M M z M mativeness I f of an observable f with respect to the parameter
we arrive at the following general family of solutions: M , defined by

ln ( )
P
f ( )
P =
C + C , 2.16
1 2 I = -
bVar M[ f ]g 1. 2.24
M
f

where C i are independent of P but may depend on M . The smaller the error of the value of M extracted using f , the
For convenience of formal investigation we will usually deal larger the informativeness of f .
with the following member of the family 2.16: Then I opt is simply the informativeness of f opt .

Note that Fisher's information is an attribute of data
ln ( )
P
f ( )
P = . 2.17 whereas the informativeness is a property of an observable.
opt M
It is also possible to talk about an optimal observable from a
restricted class of observables. An example of such restriction
Then Eq. 2.15 is essentially the same as is considered in Sec. 4.52.
f =
opt 0 . 2.18 Quasi-optimal observables 2.25

 As a practical prescription, one may drop multiplicative and The fact that the solution 2.17 is the point of a quadratic
additive P-independent constants from f minimum means that any observable f
opt (P) without violat- quasi which is close to
ing optimality of the observable. However, Eq. 2.18 may then 2.17 would be practically as good as the optimal solution (we
be violated, and the relations such as 2.22 would then have to will call such observables quasi-optimal). A quantitative
be modified accordingly. measure of closeness is given by comparing the O (1) and
O ( 2) terms on the r.h.s. of 2.22:
The solution 2.17 is a local quadratic minimum 2.19

2 2 2
Consider 2.11 as a functional of f , Var M [f ]. Assume is a f - f
opt opt
<< 1, 2.26
function of events such that 2 < . We are going to evalu- f 2 2
opt

ate the functional Taylor expansion of Var M [f opt + ] with re-
spect to through quadratic terms: where = f - f - f
quasi quasi opt .

Var M [ f + ] = Var M [ f ]
opt opt The subtracted term in the numerator can be dropped,
which only overestimates the l.h.s. and is safe. Assuming for
1 L 2 Var M[ f ]O
+ ( )
P (Q) dP dQ + K 2.20 simplicity of formulas that f = 0, the criterion 2.26 takes
2 z f ( )
P f (Q)
NM QP quasi
f = f opt the following simple form:
It is sufficient to use functional derivatives and relations such
as 2.13 and 2
f f f 2
- <<
quasi opt opt . 2.27
f (Q) = ( ,
P Q), ( ,
P Q)( )
P P
d =
z
(Q) . 2.21
f ( )
P Here is the representation in terms of integrals:

2
We obtain the following result which appears to be new: d ( ) f ( ) - f ( ) << d 2
P P P P P P
( ) f P
z z
quasi opt opt ( ) . 2.28
Var M [ f + ]
opt The criterion 2.27 may be more useful in the practical con-
1 1 2
= + f 2
{  2 - f  } + K struction of f quasi, and since the latter would tend to oscillate
opt 2.22
f 2 f 2 opt
3
opt opt around f opt causing fopt to be suppressed, the difference

between 2.27 and 2.26 may be negligible.
where = - .
As a rule of thumb, one would aim to minimize the brack-
Non-negativity of the factor in curly braces follows from the eted expression on the l.h.s. of 2.28 for each (or "most") P:
standard Schwartz inequality.

2
f (P) - f ( )
P << f 2
quasi opt opt ( )
P . 2.29


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 6 of 45



One can talk about non-optimality of observables (i.e. their corresponding error ellipsoids for different confidence levels
lower informativeness compared with the optimal observable) from the space of values of the observables into the space of
and also about sources of non-optimality. These have a simple parameters is straightforward.
interpretation in the case of quasi-optimal observables as the Connection with maximum likelihood 2.32
deviations of fquasi(P) from f opt (P) which give sizeable contri-
butions to the integral in 2.27. The simplest example is when The prescription 2.17 is closely related to the standard
f opt is a continuous smoothly varying function whereas f quasi is a method of maximum likelihood that prescribes to estimate M
piecewise constant approximation. Then f quasi would usually by the value which maximizes the likelihood function:
deviate most from f opt near the discontinuities which, therefore, ln(P) , 2.33
are naturally identified as sources of non-optimality. i i

It is practically sufficient to take Eq. 2.17 at some value where summation runs over all events from the sample. The
M = M 0 close to the true one (which is unknown anyway). This necessary condition for the maximum of 2.33 is
is usually possible in the case of precision measurements. One ln (P )
could also perform an iterative procedure for M starting from
P = i
ln ( ) 0 . 2.34
i i f opt =
M i exp
M M
0, then replacing M 0 with the value newly found, etc. -- a
procedure closely related to the optimization in the maximum This agrees with 2.17 thanks to 2.18.
likelihood method. So the formula 2.17 can be regarded as a translation of the
So the method of quasi-optimal observables is as follows: method of maximum likelihood (which is known to yield the
theoretically best estimate for M [9], [10]) into the language of
the generalized method of moments.
(1) construct an observable f quasi using 2.17 as a guide so that
f Equivalents of the formula 2.17 can be found at intermedi-
quasi were close to f opt in the integral sense of Eq. 2.26; ate stages of examples of derivations of estimators for pa-
(2) find M by fitting fquasi against f ;
th quasi exp rameters of standard (e.g. normal) probability distributions ac-
(3) estimate the error for M via 2.11; cording to the maximum likelihood method.e
(4) f The method of quasi-optimal observables is expected to
quasi may depend on M to find which one can optionally use
an iterative procedure starting from some value M yield results on a par with the maximum likelihood method
0 close to
the true one. (because of their close relation; see Sec. 2.25) but it has the
2.30 following advantages:
(i) applicability to situations with millions of events;
Furthermore, it is possible to use an approximate shape for (ii) a greater flexibility in the case of complicated (P).
the r.h.s. of 2.17 such as given by a few terms of a perturbative In such situations a direct minimization of the likelihood func-
expansion. In terms of quantum-field-theoretic perturbation tion 2.33 is unfeasible.
theory this means that it may be sufficient to construct f quasi on
the basis of the expressions for probability distribution (matrix Connection of Eq. 2.17 with event selection 2.35
elements squared) obtained in the lowest PT order in which the
dependence on the parameter manifests itself: theoretical up- As a simple consistency check, note that Eq. 2.17 agrees
dates of radiative corrections need not be reflected in the quasi- with the simplest procedures of event selection used to isolate
optimal observables. It may also be convenient to use a piece- the signal and suppress backgrounds.
wise linear f For instance, suppose that most sensitivity of (P) to M
quasi or even piecewise constant. The latter option
actually corresponds to conventional procedures based on cuts (i.e. the derivative
M is largest) is localized in some region
(cf. Sec. 2.35); however, using piecewise linear approximations of events (e.g. due to a superselection rule or if M is the
for f mass of a particle that predominantly decays into a certain
quasi should yield noticeably better without incurring no-
ticeable algorithmic complications. number of jets). Then fopt(P) vanishes outside :

If the dimensionality of the space of events is not large then f = 0 P
opt ( )
P if . 2.36
it may be possible to construct a suitable f quasi in a brute force
fashion, i.e. build a multi-dimensional interpolation formula A popular procedure in such a situation is to introduce a selec-
for (P) (via an adaptive routine similar to those used e.g. in tion criterion (a cut):
[11]) for two or more values of M near the value of interest,
and perform the differentiation in M numerically. P satisfies the selection criterion P , 2.37

Also, one can use different expressions for f quasi: e.g. per- and to compute the fraction of events from that region, i.e. the
form a few first iterations with a simple shape for faster calcu- observable defined by
lations and then switch to a more sophisticated interpolation
formula for best precision. f P = P
crude ( ) ( satisfies the selection criterion ) , 2.38

Several parameters 2.31 where the -function is defined according to

With several parameters to be extracted from data there are = 1 =
the usual ambiguities due to reparametrizations but one can (TRU )
E ; (FALS )
E 0 . 2.39

always define an observable per parameter according to 2.17.
Then the informativeness 2.24 is a matrix (as is Fischer's in-
formation). e Rather surprisingly, none of a dozen or so textbooks and monographs on
Since the covariance matrix of (quasi-) optimal observables mathematical statistics that I checked (including a comprehensive practical
is known (or can be computed from data), the mapping of the guide [9] and a comprehensive mathematical treatment [10]) explicitly
formulated the prescription in terms of the method of moments.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 7 of 45



In other words, with the observable 2.38 one simply ignores all reparametrizations. However, one still can define an observ-
non-trivial dependence of fopt on P inside . able per each parameter according to 2.17.
Furthermore, if in some region ' the magnitude of (P) is For 2.43, one obtains:
large and not offset by its sensitivity to M (the situation of a 2( M )
P
"large background") then one introduces another selection cri- f = = - -
M , ( ) ln ( ) ;
opt P P 2.44
M ( M - )
P 2 + 2

terion similar to 2.38, and so on. The net effect is that the ob-
servable takes the form
f =

,opt ( )
P ln ( )
P .
2.45
( M - )
P 2 + 2
f ( )
P = (P i
 K
crude satisfies - th selection criterion) 2.40
i (Recall that there is an arbitrariness in the definition of optimal
In general, f observables as described by 2.16. The arbitrariness can be used
crude may also contain a factor other than a -
function (shown with dots above). For instance, in the case of a to simplify and conveniently normalize the optimal observ-
histogram for some differential distribution of events, each bin ables, as done in 2.45.)
corresponds to an observable of the form 2.40 (the last selec- The above two weights happen to be uncorrelated:
tion criterion is whether or not a value computed for the event f f 0 . 2.46
belongs to the bin). Then the non-trivial factor (shown with M ,opt =
,opt

dots) may take e.g. integer values such as the number of dijets It is interesting to observe how f M ,opt emphasizes contribu-
from P that fall into the bin corresponding to an interval of in- tions of the slopes of the bump -- exactly where the magnitude
variant masses (with each bin representing one observable). of (P) is most sensitive to variations of M -- whereas taking
 An immediate practical prescription from the above con- contributions from the two slopes with a different sign maxi-
cerns intermediate regions where either
M is not small mizes the sensitivity to the signal (i.e. information on M ). At
enough or (P) is not large enough. Then one should make f the same time it suppresses contributions from the middle part
interpolate between 0 and 1 over such intermediate regions. of the bump which generates mostly noise as far as M is con-
It is clear from Eqs. 2.272.29 that such a procedure would in- cerned.
crease informativeness of the observable. Simple prescriptions  Unlike theoretical matrix elements which must include all
for that are considered in Sec. 2.52. The numerical effect here known small corrections (cf. the programs for precision calcu-
can be non-negligible (Sec. 2.62). lations of LEP1 processes [13]), the observables such as 2.44,
Optimal observables and the 2 method 2.41 2.45 need not incorporate, say, loop corrections to although
inclusion of some such information might be useful (e.g. by
The popular 2 method makes a fit with a number of non- introducing simple shapes via linear splines, etc.; cf. comments
optimal observables (bins of a histogram). The histogramming after 2.30).
implies a loss of information but the method is universal and  CONNECTION WITH THE TECHNIQUES OF WAVELETS [12].
implemented in standard software routines. On the other hand, The form of 2.44 is reminiscent of a typical wavelet, which in-
the choice of f quasi requires a problem-specific effort but then dicates that applying a wavelet filter to theoretical predictions
the loss of information can be made negligible by a suitable and experimental formulas instead of the conventional binning
adjustment of f quasi. prior to using the 2 method would improve results. Since
The balance is, as usual, between the quality of custom so- software implementations of the wavelet-based methods are
lutions and the readiness of universal ones. However, once available (e.g. on the Web), this could be a way to approach
quasi-optimal observables are found, the quality of maximum the quality of optimal observables via software routines as uni-
likelihood method seems to become available at a lower com- versal as those implementing the 2 method.
putational cost.

The two methods are best regarded as complementary: One Continuity of observables 2.47
could first employ the 2 method to verify the shape of the
probability distribution and obtain the value of M 0 to be used To directly use the prescription 2.17 may not be possible
as a starting point in the method of quasi-optimal observables because of insufficient information about . On the other hand,
in order to obtain the best final estimate for M . it is reasonable to ask what are the general properties of opti-
A theoretical importance of the optimal observables is that mal observables which ensure the best control of uncertainties.
the explicit (even if formal) expressions for optimal obser- With such a knowledge one could ensure that the pragmatically
vables (cf. 4.20) shed light on the problem of optimal con- constructed observables at least possess those properties.
struction of complex data processing algorithms (such as jet For instance, one could start with an ad hoc observable, iden-
finding algorithms). The concept of optimal observables offers tify sources of its non-optimality (Eq. 2.29 and remarks there-
specific guidelines for construction and comparison of such al- after) and modify the observable to mitigate their effect.
gorithms by simply regarding them as a tool for construction of From the above reasoning it follows that continuous observ-
quasi-optimal observables. ables are less sensitive to statistical and detector errors.

Example. The Breit-Wigner shape 2.42 There are several reasons why one should prefer continuous
observables:
Let P be random real numbers distributed according to (i) Optimal observables f opt inherit continuity properties of
1 1 (P). In the problems we consider the latter is always a con-
( )
P = 
. 2.43
( M - )
P 2 + 2
tinuous function of the particles' parameters.
(ii) The variance 2.10 is smaller for the more continuous and
There are two parameters here, and with more than one pa-
rameter in the problem there are the usual ambiguities due to slower varying functions f (P). It tends to be larger for f (P)
which have jumps or vary fast.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 8 of 45



(iii) The error suppression effect of replacing a discontinuous Generally speaking, the regularization is a projection of the
observable by a continuous one (the so-called regularization) candidate solutions to the subspace where the exact solution is
can be significant (Sec. 2.62). supposed to reside.
(iv) Fluctuations induced by detector errors (distortions of the Regularizations may take very different forms depending on
individual events P the specifics of the problem. One example is the Feier summa-
i ) are best dampened in final results if the
observables possess special continuity properties. Let us briefly tion of Fourier series for continuous functions. Here the algo-
discuss this. rithmic simplicity of the method of Fourier expansion comes
into conflict with the continuity of the solution, and it proves
Taking into account detector errors 2.48 easier first to sacrifice continuity in order to take advantage of
the power of Fourier method, and then recur to a special trick
In reality the experimentally observed events Pi contain such as the Feier resummation to ensure a uniform conver-
distortions due to detector errors. This is expressed by a con- gence to the continuous solution.
volution:
Another example is the histogramming of events which,
( )
P = P' (P' ) D(P' , )
P ,
zd ideal 2.49 technically, is a transformation of a sum of -functions corre-
sponding to individual events into an ordinary function. In this
where D( '
P , )
P is the probability for the detector installation case only a singular approximation (a finite set of events) for a
to see the event P if it is actually P'. Then the optimal observ- continuous function (the probability distribution) is provided
ables are built from 2.49 and must inherit the smearing in- by Nature as a matter of principle.
duced by D . Empirical regularization of cuts 2.52
The difficulty here is that it may be hard to take into ac-
count the exact form of D . Then the least one can say is that The simplest procedure to transform a typical observable
the smeared probability distribution 2.49 -- and hence the op- corresponding to an event selection procedure, Eq. 2.40, into a
timal observables -- are continuous. continuous function consists in replacing the step functions
This is not very informative in the simple case where P are, with simple piecewise linear continuous functions. The sim-
say, random points on the real axis. However, in high energy plest way is to regularize each -function in 2.40 individually:
physics events P contain a fluctuating number of particles, f reg reg
( )
P = ( )
P
 K. 2.53
each described by at least three numbers (energy, , ), so that i i
one deals with O (1000) degrees of freedom, i.e. the dimen- This can be accomplished as follows.
sionality of the space of events is practically infinite. In infi- Each selection criterion in 2.40 can be reduced to the fol-
nitely-dimensional spaces radically different notions of conver- lowing generic form
gence/continuityf are possible (cf. the uniform convergence and
integral convergences such as L ( )
P > c ,
2 for functions on the real axis), cut 2.54
and the significance of the different available options is often where the l.h.s. is a continuous function of the event. For in-
missed, so it may not be easy to make the correct choice. In jet- stance, this could be a cut on the total energy of the observed
related problems, the relevant continuity is the so-called C- event, then (P) is the total energy.
continuity (Sec. 3.18). However, for practical purposes it may Instead of a single parameter c
be useful to keep in mind the following rule of thumb: cut , one now chooses a
regularization interval specified by two values

If continuity turns out to be important, then any (non- c < c < c , r c - c >
lo cut hi hi lo 0 . 2.55
pathological) kind of continuity is better than step-like dis-
continuities. Here r is the so-called regularization parameter .
2.50 The simplest option is a symmetric choice:

So, measurements based on conventional event selection pro- r = 2(c - c ) = 2(c - c ) .
cut, hi cut cut cut, lo 2.56
cedures can often be improved via replacements of hard cuts by
continuously varying observables. The simplest prescription for One defines:
that is described below (Sec. 2.52). It is rather universal and R
insensitive to the specific nature of events and cuts one deals 1 if ( )
P chi ,
|
with in a particular application. | 0
reg if ( )
P c ,
lo
( )
P = S 2.57
On the concept of regularization 2.51 | ( )
P - clo
| if c < P < c
lo ( ) hi .
Such a prescription is a special instance of the general con- T r
cept of regularization (see [14] for a systematic treatment and
history). A regularization is needed whenever there is a priori The linear form is chosen solely from considerations of sim-
information about the exact solution (such as its continuity) plicity. One could also use any other continuous (usually mo-
which is not reflected in the approximations one's method notonic) shape which interpolates between the same values at
yields. This can happen either when one uses crude heuristics the endpoints of the regularization interval. This is a useful
(such as event selection procedures) or when one uses theoreti- option when r is large.
cal methods which are likely to yield singular solutions (such For different selection criteria participating in 2.40, 2.53 the
as those encountered in pQCD; cf. the discussion around regularization interval and the shape of reg can be chosen in-
Eq. 4.4). dependently.
The most important parameter which controls the shape of
f We use the terms convergence or continuity in place of the standard reg and therefore suppression of errors is r.
mathematical term topology as more suggestive and to avoid confusion
with "topology of event".


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 9 of 45



In the context of jet-related measurements one aims to analog). The suppression of relative errors is given by the ratio
achieve C-continuity of observables. Then reg and f reg would of the two factors = -
Var f f 1
i . One obtains:
i i
also be C-continuous if such is (P) in 2.54.
 W
AR N IN G It is possible, as a psychological crutch, to inter- 1 17
. fo 1
r (P) = 1; 16
. fo
r ( )
P = 2x. 2.63
pret the resulting weights as probabilities that the events carry 2 2
the characteristics one uses for event selection. (Such an inter-
pretation was mentioned in [4] in a footnote.) However, one The effect of error suppression is significant here for all prob-
should be explicitly warned against introducing a stochastic ability distributions which can be approximated by linear poly-
decision-making according to 2.57 (a procedure of this kind nomials -- in fact significant enough to transform a 3 dis-
was suggested in [15]). Such additional stochasticity would crepancy into a 5 effect.
only be an additional source of fluctuations and therefore in- Although in more complex cases the suppression effect may
crease variance (as can be easily verified in a formal manner), be less than in 2.63, the above numbers are not as far as one
and thus not only defeat the purpose of regularization but exac- might suppose from reality. Indeed, it is in general possible to
erbate the problem. change variables appropriately and integrate out inessential
Choosing the regularization interval 2.58 components to reduce a generic multidimensional case to the
one-dimensional. A realistic example is discussed in Sec. 4.68.
A lower bound on useful values of r is set by detector er-
rors. Let More on regularization of cuts 2.64
meas be the usual sigma for the errors induced in the

values of (P) by the distortions of P due to detector errors. It appears necessary to state that there is absolutely no physical
The effect of error suppression is negligible unless wisdom in preferring event selection (equivalent to dichotomic ob-
servables) over continuous weights. So, in view of the very general
r 2 meas . 2.59 mechanism of error suppression in the case of continuous observ-
ables and the simplicity of regularization prescriptions, one has to
For larger r the suppression factor increases as O(r1/2 ) (see explain not why one should regularize the cuts but why one does not
do so.
sec. 2.6 of [4]). The suppression effect for statistical errors is Recall in this respect that the commonly used statistical methods
also greater for larger r , and so r should be chosen as large as such as histogramming originally emerged in the context of applica-
possible, in general. The best guidance here is Eq. 2.17. A high tions such as demography and agriculture, not high-precision parti-
precision in the choice of regularization interval is not re- cle physics experimentation where the proliferation of cuts in data
quired. However, for large r one may wish to choose more processing elevates them to the level of a first-order algo-
complex shapes than 2.57 (e.g. consisting of several linear rithmic/mathematical phenomenon. For instance, the procedures for
pieces glued together). smoothing conventional histograms found in standard numerical
packages are not the same as building histograms with regularized
The regularization interval may also be restricted by other bins: the former entail a loss of numerical information, the latter en-
considerations, especially for large r (e.g. onset of a different hance it by suppressing errors. (See e.g. sec. 12.9 of [4]. A closely
physical mechanism which causes a large background). In such related mathematical techniques is the wavelet analysis [12].)
cases one may opt for more asymmetry than in 2.56. Perhaps it ought to be considered an element of basic culture in
Some idea about a potentially possible magnitude of sup- data processing that an event should always be accompanied by a
pression effects can be obtained from Sec.2.62. real weight. (There might be advantages in allowing the lower levels
of the detector facility to yield events with weights not equal to 1
Algorithmic aspect 2.60 from the very beginning.) Computer memory is cheap enough that
extra four bytes per event should not be a burden, and one can al-
The simplest first step towards a systematic use of regulari- ways revert to dichotomic weights -- but one never quite knows
zation is to introduce a special 4-byte real field for the weight, what one looses precision-wise when one sequentially applies a
for each event. The field is initialized to 1. As the event passes dozen hard cuts to one's events loosing a few % of precision at each
selection stages, the weight is modified according to 2.57 and hard cut. Modest as the bang here may be, on a per buck basis it is
2.53. If the weight becomes zero at some selection stage, the certainly greater than with any hardware upgrade.
event is dropped as usual. In the end the selected events' The widely spread way of thinking in terms of "event selection"
weights are summed up instead of the simple counting of as a primary tool of data processing is based on a mental attitude
events. Similarly modified should be observables built from which could be explained by:

selected events: The limitations of computer resources in the past -- a factor
which seems to be much alleviated thanks to Moore's law.
f = N -1 w f
(P ) . 2.61  The fact that standard textbooks teach probability in the spirit of
reg i i
the Kolmogorov axiomatics in terms of subsets and the correspond-
In particular, what used to be an event fraction now becomes ing probabilities. For such axiomatics, the issues of continuity in the
N -1 w cases when random events occur in a continuum, are extraneous.
i . The usual procedure corresponds to w i = 1.  The penchant for thinking physics in terms of "regions of phase
The algorithm described by 2.57 requires only a universal space" rather than continuously varying observables. This has some
few-lines subroutine. foundation in the cases when the events can be tagged somehow
[e.g. in the case of (approximate) superselection rules] but not in
Generic examples 2.62 QCD situations typically encountered in problems involving jet
Some idea about the effect of regularization on the sensi- counting. Identifying the most interesting region of phase space is a
useful heuristic but ought to be regarded as only a first step in the
tivity of observables to statistical errors is given by the fol- construction of the observable.
lowing one-dimensional examples. However, the conclusions Interestingly, a similar way of thinking in terms of "regions"
have a more general validity (see below). proved to be detrimental for the theory of Feynman diagrams (see
Let P be a point from the segment [0, 1]. Compare the comments in the E-print posting of [16]). Apparently, the asso-
f ( )
P = (x > ciation of "regions" with "physics" is a piece of mythology deeply
1 1/ 2) (a hard cut) and f x
2 ( )
P = (a continuous


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 10 of 45



rooted in the intellectual culture of high-energy physics community. Notations. Representations of events 3.1
It may perhaps be partly connected to a subconscious rejection of the
quantum mechanical notion that it is impossible as a matter of prin- The beam axis is z and it corresponds to the 3rd component
ciple to tell which hole an interfering electron passed through. of 3-vectors. The polar angle is measured from the beam
This issue also seems to be psychologically related to the com- axis, and the azimuthal angle is defined accordingly.
mon insistence that Monte Carlo event generators produce events
with unit weights. Even if one uses event selection, the most basic Let E cal be the calorimetric energy -- the number meas-
observables are probabilities, so neither individual events nor their ured by a calorimetric cell. It is usually interpreted as the time-
(integer) number but only event fractions are fundamental simplest like component of the 4-momentum of the particle which hit
estimators of the corresponding probabilities. But then it is totally the cell. It is assumed sufficient to treat all particles as
irrelevant whether the estimate is obtained by counting events or massless, so that their energies are not distinguished from ab-
their fractional weights -- the result will be fractional anyway. solute values of their 3-momenta.
Fractional weights accompanying events in the process of event
selection would nicely mesh with other related experimental notions In jet studies one deals with two physical situations in
(such as the probabilities for a detected particle to be an electron) which slightly different kinematical aspects are emphasized.
and with fractional weights accompanying MC-generated events. This is reflected in how jets are looked at:
Theoretical estimates of the event fraction fluctuations for a given When studying processes with c.m.s. jet production (mostly
corner of phase space seem anyway to be best done by evaluating e+e- annihilation), spherical symmetry is emphasized, and so
the variance of the corresponding weight function (because then ad- one works within spherical kinematics, dealing with points of
aptation techniques in integration routines may be used for greater
computational efficiency). unit sphere represented either by the pair of angles , or by
Note that (pseudo) events with fractional weights occur naturally unit 3-vectors denoted as $ , $
p q , etc.
when one attempts to restore the partonic event (see Section 9). This When studying hadron collisions, the colliding partons' rest
is in fact similar to how experimentalists restore observed events frame is unknown so that invariance with respect to boosts
from detector signals. along the beam axis has to be maintained. Then one works
In short, the absolutization of event selection blinds one to some
useful options in data processing. within cylindrical kinematics and introduces the so-called
transverse energy


Observables in QCD. Kinematical aspects 3 E Ecal sin p2 2
1 + p2 , 3.2

In the preceding section we have introduced the notion of and pseudorapidity
(quasi-) optimal observables for precision measurements of = ln cot
( / 2) , - < < + . 3.3
fundamental parameters. Such observables allow one to ap-
cal
proach the theoretically possible precision for the parameters Then a massless 4-momentum p = ( E , p , p , p )
1 2 3 is repre-

with a given event sample. We found that optimal observables sented as
are given by an explicit formula in terms of the probability p = E (cosh, cos, sin, sinh ). 3.4
density (P) (Eq. 2.17). In QCD, however, one may have a
Monte Carlo event generator with a dependence on fundamen- Boosts along the beam axis correspond to shifts of .
tal parameters built in, but no algorithm to evaluate (P) for a Particles and events 3.5
given event P. In such a situation it is reasonable to construct
observables incrementally by combining as many properties Let P be the event as seen by an ideal calorimetric detector
from the optimal ones as possible. installation. Then P is a collection of "particles" which can be
There are two types of such properties: kinematical and dy- just calorimetric cells lit up by the event. Particles in the event
namical. The kinematical properties reflect requirements of will be enumerated using the labels a, b . The a -th particle/cell
two kinds: experimental (appropriate continuity to suppress is represented by its energy Ea and direction $pa . Formally:
sensitivity to statistical fluctuations and detector errors in data)
and theoretical (conformance to structural properties of quan- P = lE q
a , $ a
p , 3.6
a =1K N (P)
tum field theory in general and QCD in particular, in order to
enhance quality of theoretical predictions). Dynamical proper- where N (P) is the total number of particles in P.
ties reflect the specific behavior of (P) such as predominant It is convenient to allow particles with zero energy in 3.6.
production of certain types of events. In general, the result 2.17 This corresponds to the fact that a low-energy particle may not
incorporates both kinematical and dynamical restrictions, with lit up the cell it hits.
the former playing the role of a fine-tuning for the latter. How- In what follows we will be talking about partonic events,
ever, the specifics of QCD dynamics (a fast variation of (P) hadronic events, jet configurations, etc. They are all objects of
between the points in the space of P which are close in the the same type 3.6.
sense of C-continuity; see [4]) enhances the role of kinematical The meaning of the energies E
considerations (see the example in Sec. 4.68). a depends on the chosen
kinematics:
A systematic study of QCD observables from a kinematical
viewpoint (continuity and sensitivity to errors, and compati- REcal spherical kinematics;
bility with quantum field theory) was performed in [3 ] [5]. a
E = S|
a 3.7
In this section we review the findings of [3 ] [5]. E
T| a cylindrical kinematics.

The directions $p can be represented in different ways (e.g.

by and ; by a unit 3-vector; etc.), but all the reasoning until
Sec. 7 is independent of the representation. All we need is the


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 11 of 45



usual angular distance between two directions, | $ $
p - q | , which P exp = n P, d s
a . 3.14
is defined unambiguously. a

 For definiteness, we will always be talking about spherical If the angular size of da (= the size of the angular region in
kinematics in what follows. Then d $p is an infinitesimal ele- which d
a 0 ) is sufficiently small, da is represented by a di-
ment of the surface of the unit sphere. The final prescriptions rection $pa , and we come back to 3.6.
for jet definition will be formulated independently of this as-
sumption. This proves that the representation of even-as-energy-flow
by the collection 3.12 is equivalent to the conventional
Events as measures 3.8 "particle" representation 3.6 -- with one important improve-
ment: unlike the numbers which constitute 3.6, each number in
We are actually interested in events as seen by a purely the collection 3.12 is fragmentation invariant.
calorimetric detector installation, i.e. energy flows. Energy
flow is insensitive to fragmentations of any particle of the  In what follows we will interpret events in the sense of 3.9
event into any number of collinear fragments directed the same and 3.12 (the latter is just a shorthand notation for the former),
as the parent particle and carrying the same total energy. How- treating the representation 3.6 as a bookkeeping device.
ever, the representation 3.6 is defective in this respect in that it The domain P 3.15
is not explicitly fragmentation-invariant.

The following representation of events-as-energy-flows was It is convenient to impose the following restriction on
found to respect physical requirements to maximal degree (see events:
[4] and the reasoning below):
E

a 1 . 3.16
a
P E ( $, $
p p ) P( $p) . 3.9
a a a
This is because the events' energies are bounded by a constant
in any experiment, and the structure of energy flow is inde-
Here the -functions obey the usual rules of integration over pendent of the event's total energy, at the basic level of so-
the unit sphere: phistication.

d
z $
p ( $ , $
p p ) d ( $p) d It would be sufficient to have the equality in the above re-
a = ( $p ) 3.10
unit sphere a
striction. The inequality is allowed only because of the formal
for any continuous function on the unit sphere d ( $p) . convenience resulting from the use of the linear structure in the
In mathematical terms, the object 3.9 is a measure on the space of events represented as measures 3.9.
unit sphere. By definition, it acquires a numerical meaning af- The following collection of events will be the arena of much
ter integrations with continuous functions: of the subsequent mathematical action:

P, $
d dp P( $)
p d ( $p) = E d ( $p ) .
z 3.11 P = all events P which satisfy Eq. 3.16. 3.17
unit sphere a a a
In other words, Eq. 3.9 is essentially a convenient shorthand
notation for the collection of values 3.11 for all such d ( $p) : C-continuous observables 3.18


P n P,d s 3.12 We are dealing with observables f (P) defined on events P
d ( $p) are all continuous functions on unit sphere from the domain P . We saw in Sec. 2.48 that smearings due to
detector errors cause the probability distribution of observed
 The expression 3.9 is explicitly fragmentation invariant, as events and, therefore, optimal observables 2.17 to possess spe-
are Eq. 3.11 and the r.h.s. of 3.12. cial continuity properties which we are now going to study.

Note that the same notion of C-continuous observables will
Calorimetric detector cells 3.13 reemerge from analysis of predictive power of pQCD
Elementary calorimetric cells are naturally represented by (see comments after 4.2). This is because the choice of calo-
d ( $p) corresponding to their idealized angular acceptance rimetric detectors for measurements is determined by the
limitations of predictive power of pQCD in regard of hadronic
functions: such d ( $p) takes the value 1 inside some small an- events [8]. Therefore, C-continuity is a fundamental notion in
gular region, and continuously falls off to zero outside that re- the theory of jet observables.
gion, so that if E, $p are the particle's energy and direction Before we turn to precise formulations, note the following.
then the energy detected by the cell is E d ( $p) (the closer to Any function f (P) when considered on events with exactly
the cell's boundary the particle hits the cell, the less the frac- N particles, becomes an ordinary function of N composite ar-
tion of the energy registered by the cell). Then the energy guments:
which the cell d sees when confronted with the event P is h
given by 3.11. f ( )
P f {E , $p }, ,
K
c E
N { p
1 1 N , $ N }
( ) . 3.19

In view of this interpretation, it becomes physically trans- Then f (P) as a whole is a sequence of such component func-
parent why the event-as-energy-flow is equivalent to the col- tions f N , N = 1, ... . Such a representation in terms of compo-
lection of values 3.12. In practice one deals with a finite col- nent functions is natural from the viewpoint of perturbative
lection of calorimetric modules da , and with the corresponding QCD where one deals with a small number of particles in each
finite collection of numbers da ( P) for each event P. These order of perturbation theory (cf. [17]).
numbers constitute the experimentally measured approxima- However, similarly to how the nave representation of
tion to the ideal information content of P: events 3.6 is insufficient in that it is not explicitly fragmenta-


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 12 of 45



tion invariant and thus potentially misleading in the construc- An elementary measurement device (a calorimetric cell in
tion of data processing algorithms, so the corresponding repre- our case) yields a non-empty interval of real numbers (r ', r ")
sentation of observables 3.19 may also be insufficient. for each instance of measurement. Consider the subset of all
In particular, it would be hard to formulate C-continuity in events which could have produced the same interval (which
terms of 3.19. without loss of generality can be taken to be open) and denote
C-convergence of events 3.20 it O, r ', r " .
A complex detector installation consists of a finite number
To define continuity of a function f (P) one first has to es- of such devices, and each instance of measurements actually
tablish the notion of convergence of its arguments, in our case registers a subset of events which corresponds to the intersec-
the events P. The issue is non-trivial here because our events P tion of O, r ', r " for all elementary devices which constitute the
run over the infinitely-dimensional domain P , and in infinitely- detector.
dimensional spaces many radically non-equivalent notions of The sets O, r ', r " constitute the so-called subbase which
convergence are possible. So, when does a sequence of events uniquely determines a topology in the space of events.
Pn converge to an event P? The convergence described by 3.21 is equivalent to the to-
For instance, a nave convergence defined on the basis of pology obtained in this way for elementary measurement de-
3.6 would be to require convergence of all numerical vices described by 3.11 (cf. Sec. 3.13).
"components" of 3.6. Namely, one would require that
N (P Distance to quantify similarity of events 3.23
n) N (P) , which would mean that N (Pn) = N (P) for all
sufficiently large n . Then one would require that the energy It may be helpful to point out a single numeric measure of
and direction of each of the particle from Pn converged to the distance between events P which would correspond to C-con-
energy and direction of some particle in P. However, this is vergence. The distance is fully constructive (although a bit
clearly inadequate because an event consisting of one narrow cumbersome) and corresponds to the intuitive notion of simi-
cluster of particles which gets narrower as n may con- larity of two events at various angular resolutions.
verge in an intuitive physical sense to a one-particle event even Define:
if the distribution of energies between particles in Pn wildly
fluctuates with changing n . Rexp -x-2 fo x
r < 1,
To obtain a correct answer one should realize that conver- ( x) = S| 3.24
gences such as the one being discussed are simply a mathe- 0 fo x
r >
T| 1;
matical way to describe the general structure of one's meas-
urement devices, so that the corresponding continuity of ob- d ( $p) =
d / Ri ;
R, $ $ 3.25
q p, $
q
servables would ensure their stability with respect to detector
errors. This describes an ideal calorimetric cell of radius R centered at
$
q . (It would have been sufficient for each R to restrict $
q to a
In our case the correct choice is the so-called
C-convergence.g, h Its definition is directly connected to how finite grid of points so that each point of the unit sphere is no
calorimetric detector cells see events: farther than R / 2 from the nearest point of the grid.)
The following expression is interpreted as the distance be-
tween P and Q at the angular resolution R :
The sequence of events Pn is said to C-converge to P if Pn
in the limit of n become indistinguishable from P for any dist b ,
P Qg = maxq q -
$ ,
R P dR, $ Q, dR, $q
calorimetric detector cell d , i.e.
= maxq -
$ P Q, dR, $ .
q 3.26
P , P
n d , d 3.21
It is bounded by 1 if both events belong to P .
in the usual numerical sense for each continuous function
d ( $p) defined on the unit sphere. To obtain a measure of distance for all angular resolutions,
simply take a sum over increasingly better resolutions R n 0:

One could use here special d corresponding to realistic de- Dist b ,
P Qg = dist
3.27
n b ,
P Qg .
n =
tector cells and described in Sec. 3.13 but the extension by ,
1 2K Rn

linearity to arbitrary continuous functions is convenient and = 2-n
does neither restrict nor relax the definition. The sequence R n is otherwise arbitrary, e.g. R n .
The convergence 3.21 can be described in a more conven- The sum of positive coefficients n must be finite. We nor-
tional fashion using an appropriately chosen measure of dis- malize them so that
tance between events (Sec. 3.23). = 1 . 3.28
n K n
=1,2
Formulation in terms of open sets 3.22
This ensures the following normalization of Dist:
The above formulation is equivalent to the following one Dist b ,
P Qg 1 for any P and Q from P . 3.29
phrased in a canonical mathematical language. For simplicity
we ignore statistical fluctuations of the errors; our purpose is Verbally, each next term in the sum 3.27 describes the dif-
only to show how the basic structure of detector errors uniquely ference between P and Q at a higher angular resolution. The
determines the topology (convergence) in the space of events. rate of decrease of n as n controls sensitivity of 3.27 to
the differences between P and Q at higher angular resolutions.
g C from "calorimetric"; we will also use the verb to C-converge, etc. For instance, = 2-n
n .
h In terms of pure mathematics, the C-convergence is an instance of the so-
called -weak topology in the space of linear functionals [18].


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 13 of 45



The decreasing sensitivity of the expression 3.27 to correla- [4]). This is essentially because C-continuity imposes restric-
tions between P and Q at smaller angular distances nicely re- tions on allowed rate of variation of simultaneously all compo-
flects the decreasing physical importance of such correlations. nent functions f N . From the viewpoint of pQCD, such a re-
quirement connects all orders of perturbation theory, and there-
The usual definition of convergent sequences based on this fore is inherently non-perturbative.
measure of distance in P ,
C-continuity is a combination of fragmentation invariance and a
Dist b P , P g 0 ,
n 3.30 special continuity in particles' parameters, formulated without refer-
ence to the structure of perturbative partonic states.
is equivalent to the C-convergence, Eq. 3.21. 3.34

 The above definition of Dist resembles constructions of the
wavelet analysis [12] with (x) corresponding to the mother The usual shape observables such as thrust and the jet num-
wavelet. This is not the only place where the logical patterns of ber discriminators (as well as the classes of observables de-
the wavelet analysis come to the surface in our theory (cf. the scribed in [4]) are C-continuous whereas the thrust axis is not.
comments after 2.46). Nor are C-continuous the number of jets and individual jets pa-
rameters -- irrespective of the jet definition adopted.
 A mathematician would note that the closure of P is com- On the other hand, the prescriptions of Section 9 eliminate
pact with respect to the C-convergence. This is a special case (most) C-discontinuities from observables constructed on the
of the Banach-Alaoglu theorem [18]. This is important for the basis of jet configurations found by the optimal jet definition
study of the structure of C-continuous observables (Sec. 3.35). introduced in this paper.
 Although one may be psychologically more comfortable  Concerning the regularization prescriptions of Sec.2.52, we
with the definition of convergence in the space of events in note that if the l.h.s. of 2.54 is C-continuous (which is often the
terms of a single numeric measure of distance 3.30 rather than case in practical situations) then such is 2.57.
the seemingly more amorphous definition 3.21, the latter is
deeper and is actually simpler. The possibility to express the Structure of the space of C-continuous
convergence in terms of one distance 3.30 is accidental and its functions 3.35
form exhibits too many inessential details. Eq. 3.21, on the
other hand, goes to the heart of the matter by directly reflecting The simplest example of C-continuous functions is immedi-
the structure of multimodule detectors and leading to the pro- ately deduced from the definitions 3.32 and 3.21. Suppose
found identification 3.43. The usefulness of the entire logical f ( $p) is continuous everywhere on the unit sphere. Then the
pattern rooted in the definition 3.21 is demonstrated by the function f (P) defined on eventsi according to
derivation of jet definition in Section 6 -- it is not clear what
heuristics one would have been guided by should one decide to f ( )
P = ,
P f = E f ( $ ) .
p 3.36
work in terms of the distance 3.30. a a a

(cf. 3.11) is C-continuous by definition. Such f (P) will be
C-continuity of observables 3.31 called basic shape observables . They will be further discussed
in Sec. 3.43.
The formal definition is as follows:
Furthermore, arbitrary C-continuous functions can be ap-
proximated by algebraic combinations of the basic shape ob-
An observable f (P) defined on events from P is C- servables in a fashion similar to how arbitrary continuous
continuous if functions on, say, unit cube can be approximated by ordinary
f (P ) f P
n ( ) 3.32 polynomials.j This analogy is illustrated by the following table:

whenever P
n P in the sense of C-convergence (3.21 or 3.30).
vector P = (P1,...) event P

Qualitatively, C-continuity is the same as stability with re- unit cube 0 P
i 1 the domain P (3.17)
spect to distortions of energy flow deemed physically less sig-
continuity C-continuity
nificant in jet-related measurements (such as due to minor re-
arrangements of detector cells, several particles hitting the linear functions basic shape observables
same cell, detector errors, etc.). Such distortions may cause the (Eq.3.36)
i c i Pi
numbers which constitute 3.6 (e.g. the observed number of (multi-)energy correlators
particles) to jump erratically, whereas the values of C-con- products of linear func-
tions (monomials) (Eq.3.40)
tinuous shape observables would exhibit continuous variations.
continuous functions
C-continuity and fragmentation invariance 3.33 C-continuous observables f ( P)
f ( P) (generalized shape observables)

Since the definition of C-convergence is entirely in terms of
the fragmentation-invariant representation of events 3.9, a 3.37
function f (P) that is C-continuous is automatically fragmenta-
tion invariant (if Q differs from P by exactly collinear frag- i Note a convenient abuse of notation: both the angular function and the
mentations then Dist (Q, P) = 0, so Eq. 3.32 implies that corresponding shape observable are denoted by the same symbol f . Inter-
f (Q) = f (P)). pretation depends on the type of arguments.
j A well-known theorem due to Weierstrass. Its generalization needed for
Furthermore, each of the component functions f N (see 3.19) our purposes is known as the Stone-Weierstrass theorem [18]. A mathema-
is continuous in all its arguments. However, the latter property tician would easily supply the details which physicists, however, won't
is sufficient to ensure C-continuity of f (P) (see sec. 6.9 of care about because they don't lead to useful algorithms.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 14 of 45



The approximation meant here is in the usual uniform (provided the angular function f n satisfies some additional
sense, i.e. for any > 0, an arbitrary C-continuous function regularity restrictions; see Sec. 4.1) whereas the general C-
f ( P) can be approximated by a sum of energy correlators f ' ( P) continuous functions are approximated by sums of energy cor-
so that: relators in such a way that the properties of fragmentation in-
variance etc. required for such cancellations are not compro-
sup f ( )
P - f' ( )
P < . 3.38 mised.
PP
The direct connections of the energy correlators 3.40 with
The two classes of observables shown in the right column QFT and QCD reflect the physical nature of the phenomena
(basic shape observables and energy correlators) play special concerned and ensure their superior amenability to theoretical
roles from the viewpoint of the underlying physical formalism. studies (cf. the abundance and quality of theoretical calcula-
We have already seen that the basic shape observables 3.36 are tions for the simplest shape observable thrust [20] and
singled out by their relation to the structure of elementary de- the method of a systematic study of power corrections outlined
tector modules (we will return to this in Sec. 3.42). Let us now in [16]).
discuss the energy correlators. Generalized shape observables 3.41
Energy correlators 3.39
A few comments are in order concerning generalized shape
These have the form observables. These are essentially arbitrary C-continuous func-
tions. They are obtained from energy correlators using alge-
f (P) = E KE f ( $ ,K
p , $p ) ,
3.40
a a a a n a a
1K n 1 n 1 n braic operations and appropriate limiting procedures which do
not violate the property of C-continuity (theoretically, it is suf-
where f n is a symmetric continuous function of n arguments.
Basic shape observables are special cases corresponding to ficient to ensure a uniform convergence on P in the sense of
n = 1. 3.38). Roughly speaking, such operations are applied to ob-
servables as a whole (i.e. after averaging over all events) and
The component functions 3.19 and the correlators 3.40 can they should not allow arbitrary growth of the rate of variation
be regarded as different bases in terms of which to express
of the component functions 3.19 for N . An example of a
general C-continuous observables. One function f n in 3.40 cor- correct limiting procedure is the minimization over the thrust
responds to an infinite sequence of component functions 3.19. direction involved in the definition of thrust; cf. 4.68. For an
On the other hand, Eq. 3.40, unlike 3.19, is automatically example of illegal sum see sec. 6.9 of [4].
fragmentation invariant.
It is clear a priori that if quantum field theory is a funda-
Furthermore, the correlators 3.40 are singled out for two mental mechanics governing the phenomena observed in high
theoretical reasons which reflect the fundamental structures of, energy physics, then it should be possible to express any truly
respectively, quantum field theory and QCD. This has far observable phenomena (unlike artifacts such as instabilities) in
reaching consequences. a QFT-compatible language, i.e. via observables that can be
First, such correlators naturally fit into the general structure approximated by energy correlators. This was the original ra-
of quantum field theory where the apparatus of multiparticle tionale behind the theory of [3][5].
correlators is intimately related to the fundamental formalism Unfortunately, even in one dimension simplest polynomial
of Fock space and is central in quantum field theory and statis- approximations in the spirit of the Weierstrass theorem
tical mechanics [19] because it allows one to systematically de- (polynomial interpolation formulas) are seldom sufficient:
scribe systems with a fluctuating number of particles (as is the spline approximations build by gluing local polynomials are
case e.g. with multiparticle events in high-energy physics generally more useful. This is even more so in infinitely many
experiments).
dimensions (as is the case with P ), whence the need for spe-
Second, the energy correlators 3.40 are directly expressed in
terms of the fundamental energy-momentum tensor [5] (we do cial tricks such as jet algorithms. An array of prescriptions al-
not need explicit expressions here). This allows one to directly lowing to simulate conventional jet-based observables such as
address the well-known problem that predictions of pQCD are dijet mass distributions in the language of C-continuous obser-
formulated in terms of quark and gluon fields whereas experi- vables was described in [4]. Such prescriptions altogether cir-
mental data deal with the observed hadronic degrees of free- cumvent representation of events in terms of jets.
dom. Indeed, the energy-momentum tensor is determined solely However, the purpose of the examples presented in [4] was
by the space-time symmetries of QCD. It is thus independent of primarily to demonstrate mathematical mechanisms ensuring
a particular operator basis used to represent the theory (quark that information extracted via generalized shape observables
and gluon fields, or hadronic fields) and so absorbs all the un- contains features that can be directly related to the conven-
known complexity of confinement and hadronization. There- tional procedures (such as -spikes in the so-called spectral
fore, observables which are expressible in terms of the energy- discriminators corresponding to multi-jet substates). For prac-
momentum tensor can be computed either in terms of hadronic tical purposes, it may be more convenient to start from the
degrees of freedom or from perturbative quarks and gluons. For conventional observables and try to eliminate C-discontinuities
such observables, the criterion of infrared safety (cancellation which spoil optimality of observables. Prescriptions for doing
of singular logarithms, etc.) reduces to verification of existence so are described later on in this paper.
of the energy-momentum tensor in QCD as an operator object. In what follows we will be using the term C-continuous ob-
We will return to this in Sec. 4.1, and here only note that the servables as less ambiguous than generalized shape observ-
described way of reasoning clarifies the conjecture of [8] that ables.
observables for which pQCD predictions make sense are those
for which infrared and collinear singularities cancel thus en-
suring their insensitivity to non-perturbative physics. Such a
cancellation is guaranteed for the energy correlators 3.40


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 15 of 45



Basic shape observables and physical with, say, a transcendental number such as which is com-
information 3.42 pletely specified by an infinite number of digits but in practice
represented by their finite sequences.
So, one can represent an event either in terms of particles, By identifying the information content of the event P with
Eq. 3.6, or in terms of values of basic shape observables, cf. the collection of expressions 3.36, not only have we not devi-
3.12. In the absence of detector errors and other imperfections, ated from the experimental reality but we have actually re-
the two representations are numerically equivalent. turned closer to it compared with the r.h.s. of 3.6 (if only by
However, the structure of detector errors is an essential part allowing finite angular resolutions), at the same time giving it
of physics, and in this respect the two representations differ: a systematic form which is convenient for the derivation of the
the numbers which constitute the r.h.s. of 3.12 individually jet definition (Section 5).
possess the correct stability properties with respect to small
distortions of the event -- distortions of the kind specific to the
type of measurements we deal with. The numbers which con- Observables in QCD. Dynamical aspects 4
stitute Eq. 3.6, on the contrary, do not possess this property.
For clarity's sake, suppose the event has two sufficiently Now we turn to dynamical (i.e. QCD-specific) considera-
energetic particles a and b whose directions are close. Then tions in the construction of optimal observables according to
Eq. 2.17 in the context of hadronic events produced in high en-
replacing the pair a, b with one particle c whose 3-momentum ergy physics experiments. The big problem is that such events
is the sum of the 3-momenta of a and b is deemed to distort contain O (100) particles described by 3 degrees of freedom
the calorimetric physical information carried by the event only each. On the other hand, the underlying physics is controlled
a little (the less the difference between $pa and $p , the less
b by a few Standard Model parameters, whereas all the com-
the distortion). The individual numbers which constitute the plexity of hadronic events is supposed to be generated by the
r.h.s. of 3.6 do not have this property: they can exhibit non- QCD Lagrangian that contains only one coupling S and quark
negligible chaotic fluctuations even if the physical information and gluon fields most of which can be regarded as massless.
content of the event varies negligibly. This means that from the viewpoint of studies of both the
A simple analogy may further help to understand the role of Standard Model and QCD Lagrangian most of the observed
continuity: imagine a ruler marked randomly instead of the degrees of freedom are physically not important. In the lan-
standard ordered numbering. Representing length by using a guage of the theory of optimal observables (Sec. 2.7), one could
number obtained from such a ruler would be not dissimilar to say that the optimal observables for extraction of the Standard
representing the event via 3.6: it would be sufficient for book- Model parameters are mostly sensitive to a few degrees of
keeping purposes, but it would require great care in construc- freedom which the conventional wisdom identifies with the
tion of data processing algorithms such as computation of vol- representation of events in terms of jets.
umes, prices, etc. The chain of reasoning presented below is intended to make
Similarly, whereas the representation 3.6 is convenient for more explicit, and thus help to clarify the argumentation of the
book-keeping purposes, one should avoid relying on its form in theory of jets including the part about inversion of hadroniza-
the design of data processing algorithms. Such algorithms tion. Much of the argumentation is familiar but phrased in a
should in general respect additional restrictions not reflected in more formal language to facilitate a systematic investigation.
3.6, namely, the restriction of C-continuity. The difficulties en- Since we are interested in issues such as hadronization, the
countered by the experts in jet definition (such as a lack of perturbation theory discussed below only concerns QCD. Elec-
fragmentation invariance of some suggestions related to jet al- troweak effects are assumed to be taken into account in the
gorithms) are often artifacts due to a failure to reason about theoretical amplitudes as necessary.
jets and energy flows in terms which correctly reflect the
physical nature of the problem. The basic conjecture of the QCD theory of jets 4.1
Note in this respect that all the seemingly abstract notions
which we introduced (events as measures on the unit sphere, The conjecture of Sterman and Weinberg [8] is that the
C-continuity, etc.) are essentially only notations, i.e. formulaic property of infrared safety of observables ensure their calcula-
expressions of what is. bility within the framework of pQCD. We would like to ex-
In fact, these notions are neither more abstract nor difficult press it in a formal fashion and to connect the notion of IR
than, say, the differential calculus. But they are usually taught safety with C-continuity (Sec. 3.31).
as "advanced" topics in the abstract courses of functional In the final respect, one needs (quasi-) optimal observables
analysis without link to applications, which earns them a bad (Sec. 2.25) to extract the values of fundamental parameters
reputation among physicists. Then when these notions are ac- such as the mass of the W boson from hadronic data. Because
tually encountered there is a psychological tendency to reject of a large dimensionality of observed hadronic events P one
them as too abstract to be useful in practical physics. needs some specific structural information about the ideal
We are ready to take a philosophical look at Eq. 3.12: probability density (P) of their production (ideal = not taking
into account detector errors). Such information is obtained
Ideal physical information content of the event P is identified with from pQCD which deals, however, not with hadronic but quark
the collection of values of all basic shape observables, i.e. with the and gluon degrees of freedom.
r.h.s. of Eq. 3.12. The conclusions of [8], [5] (also see Sec. 3.39 above) can be
3.43 summarized as follows.For any C-continuous observable f (P)
it is correct to compute theoretical predictions within the
The adjective "ideal" reminds us that in practice only a fi- framework of pQCD. Formally:
nite subset of the collection is used, as in 3.14. One should feel
no more psychological discomfort with such a collection than


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 16 of 45



formulated fundamental equations (as is done in conventional treat-
ments [19]), any proof of 4.2 is bound to be only a more or less
dP P
( ) P
( ) = dp ( N )
f p
( ) f (p
z z ) + N +
O 1
pQCD ( ) .
S 4.2 plausible scenario because the non-perturbative l.h.s. is, essentially,
a theoretical fiction. So if it were possible to accurately verify 4.2,
Here (P) represents the exact probability density so that the l.h.s. one would have to do so roughly as follows.
is what experiments would see given ideal detectors and infinite sta- The first step would be to establish 4.2 for f that are energy cor-
tistics. The variable p on the r.h.s. represents perturbative quark relators, Eq.3.40. One would start with a non-perturbatively defined
( N )
and gluon final states. expression (the l.h.s. of 4.2), represent it in terms of correlators of
pQCD is the corresponding probability den-
the energy-momentum tensor densities as explained in [5], and then
sity computed within the shown precision in S from pQCD; it is a develop an expansion in S, ending up with the r.h.s. of 4.2.
sum of contributions proportional to n , n = ,
0 ,
K N .
S A technical subtlety is that owing to the singularities of pQCD
the integrals on the r.h.s. of 4.2 are well defined only for f that obey
somewhat stronger regularity restrictions than a mere continuity.
(i) The mathematical structure of events in the two expres- The simplest illustration for this can be borrowed from pQCD
sions is essentially the same from the viewpoint of data proc- where a typical object in theoretical answers is the so-called +-
essing (Eq. 3.6), the difference being in the number of particles distribution, (1 x)-1
- + (see e.g. [24]; here x is the parton fraction but
(O (100) for P and O (1) for p). the analytical mechanism being demonstrated is completely general).
(ii) The restriction of C-continuity is important for the valid- This distribution is defined by its integration properties:
ity of 4.2 in so far as fragmentation invariance and regularity 1 1 f ( x
- ) - f 1
( )
properties (a continuity and related stronger regularity restric- dx 1
( - x 1
) f ( x) = dx
z + z . 4.4
0 0 1 - x
tions; cf. Sec. 4.3) have to be formulated non-perturbatively for
the non-perturbative expression on the l.h.s. For the r.h.s. to be a well-defined integral, it is not sufficient that
f (x) is merely continuous, i.e. f (x) f (1); one must also assume
(iii) Q UAR K-HADR ON DUALITY. The proposition that it that f (x) approaches f (1) sufficiently fast, e.g.
is possible to replace a sum over hadronic states by the corre-
sponding partonic sum is known as the hypothesis of quark- f ( x) - f ( )
1 = O(|1 - x |) . 4.5

hadron duality. The scenario of derivation of 4.2 described in This is satisfied e.g. if f has continuous first derivatives.
Sec. 3.39 (see also Sec. 4.3) circumvents such direct replace- The technical regularity restrictions on observables f (P) required
ment via an intermediate representation in which only an aver- for the r.h.s. of 4.2 to be well-defined are multi-dimensional analogs
age over the initial state of a product of energy-momentum ten- of the restriction 4.5. For practical purposes it is sufficient to require
sor densities is involved. e.g. that the angular functions f n in 3.40 have continuous first de-
However, there is also the dynamical aspect, namely, that rivatives.m ( Ref. [17] formulated the restrictions in a slightly more
the perturbation theory would actually work in a numerically general form of the Hlder condition -- but in the language of the
satisfactory fashion. This cannot be explained by reference to component functions 3.19.)
the energy-momentum tensor per se but is made possible by That this regularity restriction does not become more stringent in
such a representation as it allows application of the usual higher orders of perturbation theory follows from the fact that the
renormalization group argument exactly as in the case of total severity of neither soft nor collinear singularities in QCD increases in
cross sections. higher orders of perturbation theory (cf. [17], [25]; this property is
related to renormalizability of QCD). But even if it did, it would not
Still, a reference to renormalization group is insufficient in- be an obstacle for the theory: one would only have to require that
asmuch as the convergence of the expansion on the r.h.s. of 4.2 observables are smooth (i.e. belong to the class C ).
depends on the behavior of the observable f . The easiest way The second step would be to extend Eq. 4.2 to more general ob-
to see this effect is by looking at the so-called power- servables than finite sums of energy correlators. This -- as is usual
suppressed corrections that are parametrized in terms of coeffi- in situations of this sort -- would be accomplished by a limiting pro-
cients not predictable from perturbation theory.k A little expe- cedure with respect to f which would commute with the limit
rience with asymptotic expansions of integrals of perturbation S 0. To this end, one has to rewrite the mentioned regularity
theoryl makes it obvious that such corrections are proportional conditions in a non-perturbative form. For instance, an analog of 4.5
to angular derivatives of the observables f in 4.2: for f which could be
vary too fast at too many points of the phase space the pertur- f ( )
P - f ( '
P ) K Dist( ,
P '
P ) for any ,
P '
P P , 4.6
bative expansion would not work. This confirms the notion that f
perturbation theory cannot predict small-scale angular correla- where Dist is defined in 3.27. One sees that if f is an energy corre-
tions in observed events. lator 3.40 then Eq.4.6 implies that the angular function f n satisfies
an analog of 4.5. Then one would define the norm
A scenario of formal verification of Eq. 4.2 4.3
f = max f ( ) + K
P P , 4.7
f
(Readers not interested in formal aspects may skip the technical
details below and go directly to Sec.4.9.) and define the space C' (P ) as the corresponding closure of the sub-
space spanned by energy correlators satisfying 4.6. This would be
As long as the construction of perturbation theory is performed in similar to the standard functional class C 1. Recall that functions
an axiomatic fashion rather than derived from non-perturbatively from the class C 1 can be uniformly approximated by polynomials

k together with their first derivatives. Observables from C' (P ) can
Cf. the studies of such corrections in the theory of QCD sum rules [21].
similarly be approximated by finite sums of energy correlators ex-
l Cf. a systematic scenario described in [16] based on the expansion
method of the so-called asymptotic operation [22], [23] which is directly
formulated in terms of -functional counterterms, so that corrections sup-
pressed by powers of the total energy involve derivatives of -functions
(power counting mechanisms ensure that higher power-suppressed correc- m A similar technical assumption -- existence of continuous derivatives of
tions are accompanied by more derivatives on -functions). After integra- f through second order -- will be made in the derivation of the key bound
tions with f , the derivatives are switched from -functions to f . for jet definition in Sec. 6.10.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 17 of 45



cept that our formulation is in terms of 4.6 instead of derivatives in p
the argument P for purely technical reasons.n f (P) f (Q) for any P and for any C-continuous f . 4.10

Finally, one would need inequalities of the form contains events with an arbitrarily large number of particles
dP A P
( ) f (P
z ) C f 4.8 but the number of particles is always positive and so limited
A from below by a minimum value. Choose any Q from so that
for ( ) ( ) its number of particles is equal to the minimum value. (This
A N N
= , , -
pQCD pQCD . In the first case (A = ) an even need not fix Q uniquely.) Then the condition (ii) implies that
weaker inequality is expected to be true (with the norm defined (N) Q
without the second term on the r.h.s. of 4.7). In the second case the pQCD ( ) is significant, i.e. that Q cannot contain more than
inequality is essentially equivalent to the proposition that the soft a few particles if perturbation theory works because emission
and collinear singularities in individual diagrams of pQCD are never of each additional particle is then suppressed by an additional
more severe than logarithmic.o In the third case one would also have factor S .
to verify that C O N
= +
( 1) (such a proposition is unlikely not to
A S That events from are close to Q in the sense of the C-
be true). convergence (as measured e.g. by the distance 3.27) means that
All in all, there does not seem to exist any analytical mechanism such events consist of a few more or less narrow energetic
which might invalidate any of the listed propositions because of the sprays of particles (each spray roughly corresponding to a par-
intrinsic analytical naturalness of the described scheme. Although ticle of Q) and perhaps some soft background, i.e. randomly
some technical details (e.g. the description of regularity conditions) directed particles which together carry a small fraction of the
might need to be made more precise, the basic requirement of C- event's energy.
continuity fits into the general scheme of things so tightly and natu-
rally, from the viewpoints of both physics and mathematics, that it Formal model of hadronization 4.11
seems unlikely that it could require a modification.
One adopts the following theoretical model for the prob-
The mechanism behind Eq. 4.2 4.9 ability distribution of events P which is built into any Monte
Carlo event generator:
Let us now try to understand the structural reasons behind
Eq. 4.2.
( ) (n) ( ) (n)
P p p  H (p, P) ,
zd pQCD 4.12
The reasoning below will be more transparent if one bears
in mind that discussing C-continuous functions defined on P is where H (n)(p, P) is the probability for the parton event p to de-
rather similar to discussing ordinary continuous real functions velop into the observed event P.
f ( x) defined on the simplex S , the part of the euclidean space  The approximate equality in 4.12 is meant to indicate that it
Rn described by x
i 0, i x i 1 (the latter restriction is analo- is not a theorem that the probability (P) can be exactly repre-
gous to 3.16). The distance 3.27 is similar in general properties sented in such a convolution form. (With O (100) free parame-
to the usual euclidean distance in Rn although the explicit ex- ters in H (n ) the error can be made very small, of course.)
pression is rather different. However, it is exactly this differ-
ence that masks the dissimilarity of the infinitely dimensional Note the following normalization restriction:
space of measures on the unit sphere from the ordinary euclid- dP H (n) (p, P
z ) 1 . 4.13
ean space Rn and thus makes possible the analogy between the
events P
P and the vectors x
S. The hadronization kernel H (n) must depend on n if the r.h.s.
The C-continuous observable f in Eq. 4.2 is in principle ar- of 4.12 as a whole is to represent the exact non-perturbative
bitrary and so can probe any small region in P (smallness can answer. In practice n is fixed and small.q

be measured using the distance 3.27). Let be a region of P  Higher perturbative terms are to be added to (n)
pQCD ,
such that:
whereas H (n) -- which is supposed to express the effect of the
(i) is small, i.e. any two events from differ by slightly entire sum of missing terms -- acts on (n) multiplica-
acollinear fragmentations into/recombinations of, any number pQCD
of particles. Formally, one can say that the distance Dist be- tively. It would be interesting to clarify this point in a system-
tween any two events from is small (i.e. << 1). atic manner.

(ii) Events from are produced with a relatively significant Represent the l.h.s. of 4.2 in terms of 4.12:
probability formally given by z dP (P)(
P is from )
.
dP (P) P
( ) = dp (n) p
( ) dP (n)
f H ( p, P) f (P
z z z
pQCD ) . 4.14
The condition (i) means that for any fixed event Q from ,
one would have: This agrees with the r.h.s. of 4.2 if

dP (n) (p, P) P
( ) p
z = ( ) + n+
H f f O( 1) . 4.15
S


The approximate equality here is supposed to be valid for any
n For instance, note the curious fact that elements of the tangent space to P C-continuous function f . For this reason, Eq. 4.15 can be con-
at any point P are distributions on the unit sphere. In other words, the tan- veniently represented as follows:
gent space (a natural habitat of the differentials dP) is different from the
complete linear space to which it is tangent. One would probably have to
develop a differential calculus for functions on P and reformulate 4.6 in
terms of the space C 1(P ) if QCD were non-renormalizable because then p At this point we don't discuss how the approximation error depends on f ,

one might need to require smoothness (the property C ) of all the func- etc. See Sec. 5.17.
tions f (P) involved. q Strictly speaking, the convolution 4.12 ought to be performed at the level
o This has the same power-counting reasons behind it as the renormaliza- of quantum amplitudes rather than probabilities but we ignore such details
bility of pQCD. here.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 18 of 45



The key difficulty is that neither the probability distribution
+
H(n) ( ,
p P) = ( ,
p P) + O n
( 1
S ) , 4.16 4.12 nor the formula 4.20 can be evaluated for a given P due
to the huge number of degrees of freedom in P (in reality, theo-
where the -function on the r.h.s. is defined in the usual way retical versions of (P) as a whole are materialized only in the
with respect to integration over p or P. form of Monte Carlo event generators). This means that a care-
ful choice of parametrization of events is needed before the
 Eqs.4.15, 4.16 are simply a convenient formulaic expres- construction of good approximations to f opt becomes possible.
sion of the verbal statement that observed events generated  With a suitable parametrization, Eq.4.20 could be used in a
from the partonic event p mostly consist of narrow jets that re- brute force fashion: one would map the events into a multi-
semble the parent partons. dimensional domain of the chosen parameters (say, q), build a
Taking into account the normalization 4.13 and the fact that multi-dimensional interpolation formula for (P(q)) (via an
f is arbitrary (apart from the general restriction of C- adaptive routine similar to those used e.g. in [11]) for two or
continuity), one deduces from 4.15 that for P typically gener- more values of M near the value of interest, and perform the
ated from p according to H (n), one has differentiation in M numerically. The resulting multidimen-
sional interpolation formula would represent the optimal ob-
+
f P = f p + O n
( ) ( ) ( 1 servable mapped to q and could be used for the processing of
S ) . 4.17
experimental events to complement the standard 2 method
based on histogramming (recall the comments in Sec. 2.41).
This is another form of the proposition that P is similar to p.
The meaning of similarity is established by the restriction of C- The difficulty is to find a parametrization that would not in-
continuity of f that are allowed in 4.17. volve a significant loss of information about M .
 If one defines a configuration of jets Q to roughly corre- Usually employed are parametrizations obtained by de-
spond to the partonic event p then Eq. 4.17 implies that Q scribing the events P in terms of a few jets, which is made
should satisfy the relation f (Q) f (P). We will see it again in possible by the specific structure of 4.20, namely:
Sec. 5.6 (i) Eq. 4.16 means that observed events P are close (in the
Sensitivity to hadronization and C-continuity 4.18 sense of C-continuity as measured e.g. by the distance 3.27) to
their parent parton events p.
We saw in Sec. 2.48 that optimal observables are C- (ii) The dimensionality of p is small.
continuous as a result of the smearing caused by detector errors
described by 2.49. C-continuity made observables less sensi- Finding such a p for each observed event P amounts to an ap-
tive to such errors. In the present dynamical context, we note proximate inversion of hadronization. This will be further dis-
that hadronization is described by a similar convolution 4.12, cussed in Sec. 4.28. Here we would like to take a slightly dif-
and for C-continuous observables fluctuations induced by the ferent view on the problem.
stochastic hadronization are suppressed too. If one could restore p from P uniquely, then the optimal ob-
 Actually, Eq.4.2 means that C-continuity makes observables servable would be identified with its perturbative version:
insensitive (within the precision of perturbative approxima- f (n) (n)
p =
d p i
opt ( ) M ln pQCD ( ) . 4.21
tion) to the hadronization effects which transform the pertur-

bative ( N )
pQCD into the hadronic (P). Remember that the However, the perturbative probability density pQCD(p)
mentioned precision of perturbative approximation depends on contains singular expressions (generalized functions such as
the magnitude of derivatives of the observable. the one represented by Eq. 4.4) that are not positive-definite,
beyond the leading orderr. This means that the perturbative ex-
Formal construction of optimal observables 4.19 pression pQCD(p) cannot be immediately interpreted as a
probability density. As a result, the derivation of optimal ob-
servables described in Sec. 2.7 is inapplicable. In other words,
Suppose we wish to measure a fundamental parameter M the expression 4.21 is formal beyond the tree approximation of
such as the mass of the W boson. All the dependence on such a pQCD.
parameter is localized within PT. Then we can combine 2.17
and 4.12 and write down a formal expression for the corre- Nevertheless, it is not impossible to use Eq. 4.21 for the
sponding optimal observable: construction of quasi-optimal observables provided one could
find a natural way to extend it (or some its simplified version)
to all events P by C-continuity. (Remember that the formal
f P = P
opt ( ) M ln th ( ) nature of p and P is the same.) Such an extension can some-
(n times be accomplished in such a way that the problem of re-
d )
p (p)  H (n) p P
z M pQCD ( , )
= . 4.20 storing p from P does not occur. Here is an example.
d ' (n)
p ( '
p )  H(n) ( '
p P
z pQCD , ) Constructing observables via extension by
C-continuity. Precision measurements of S 4.22
 The philosophical importance of this expression is that it Consider measurements of the strong coupling S in the
corresponds to the fundamental Rao-Cramer limit on the at- process e+e- hadrons. We are going to show how the con-
tainable precision for the values of M extracted from a given cept of optimal observables could have been employed to ob-
data set (recall Sec. 2.7 and the comments after 2.22). There- tain shape observables that best suit this purpose.
fore Eq. 4.20 is an ideal starting point for deliberations about
any data processing algorithms (including jet algorithms)
r
geared towards specific precision measurement applications. If (x ) > 0 and (x ) = + >
0 1 x + 2 x 2+ ... then 0 0 (if it is non-
zero) but the sign of 1 etc. may be arbitrary.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 19 of 45



At a very crude level of reasoning, the probability density refs. therein). In our notations (we assume that the event's total
can be represented as a direct sum energy is normalized to 1), the explicit expression is

p =
m p r
m p K 1 - T( )
P = 1 - max E cos
S r 2
o p
S t
pQCD ( ) ( ) ( ) ( )
2 , 4.23
2 3 3 4 4 a a a

where each term corresponds to the n -particle sector of the = min E 1
c - cos h , 4.27
a a a
space of p. Each )
n also contains O ( S corrections.
Use the prescription 4.21 with M where a is the angle between the a -th particle's direction and
S , multiply the r.h.s.
by an axis (the thrust axis). The optimizations are performed with
S (because f opt is defined up to a constant), and drop
higher-order terms in each sector, which is not prohibited by respect to the directions of the thrust axis that determines ori-
the prescriptions of Sec. 2.25. Then one obtains: entation of the angular function as a whole but does not affect
the magnitude of derivatives, which ensures C-continuity of 1 -
f P l q l q l q K
pragm ( ) ~ 0 1 2 4.24
2 3 4 T (P).
In terms of the classification of Sec. 3.35, the definition 4.27
In other words, a minimal requirement is that the observ- belongs to the class of generalized shape observables because
ables should vanish on 2-particle events. This is exactly the re- it involves an optimization procedure on top of a basic shape
quirement which was used in [3], [4] to derive the so-called observable.
jet-number discriminators Jm [P] by assuming the simplest The above reasoning can be regarded as an argument for
analytical form for f quasi (a 3-particle correlator). The simplest quasi-optimality of the observables such as thrust and jet-
C-continuous expression corresponding to the above require- number discriminators for precision measurements of
ment then is S . We
will have to say more on this in Sec. 4.68.
f =
quasi ( )
P J3[ ]
P . 4.25

Constructing observables via jet algorithms.
(See [3], [4] for exact expressions.) The conventional approach 4.28
Remember, however, that there is an arbitrariness in the
construction of Jm [P] in [3], [4]: the factors i j involved in the Let us explore how one could construct quasi-optimal
construction are only required to behave as O c
(ij ) with a observables that would approximate 4.20 using the fact that the
positive c as majority of hadronic events P resemble their partonic parents
i j 0 ( i j is the angle between i -th and j -th
particles of the event). The simplest analytical behavior corre- p, as formally expressed by 4.16 (4.15). Although it is impos-
sponds to c = 1 whereas the simplest covariant expressions cor- sible to exactly restore the parton parent p for each
respond to c = 2. observed event P (see after 4.32, Sec. 4.47 and Sec. 5.10), the
idea is a useful heuristic to start from.
 It might be possible to fix this arbitrariness, as follows. The conventional approach to construction of observables
The perturbative expression for 3 is singular and not strictly involves three elements: a jet algorithm, an event selection
non-negative exactly in situations corresponding to
i j 0 . procedure which we call the jet-number cut, and a function on
To rectify this one could perform a resummation of perturba- jet configurations.
tion series thus introducing a non-trivial dependence on S . We will focus only on the general structure and properties
Then the differentiation in 2.17 would replace the 1, 2, ... in of the conventional data processing scheme based on jet algo-
4.24 with something more interesting (i.e. dependent on S ) in rithms, and the specific form of the jet algorithm will play no
the region i j 0. Then by examining how such a dependence role in the following discussion.
affects the result of differentiation in S in the definition of
f General structure of jet algorithms 4.29
opt , one might be able to modify the observable 4.25 accord-
ingly. This interesting theoretical problem seems to require a Assume there is a so-called jet algorithm that somehow ac-
kind of pQCD expertise similar to that behind the k T-algorithm complishes an approximate inversion of hadronization. For-
[32]. mally, such algorithm is a mapping of arbitrary events P into
The increasing integer weight in each sector in 4.24 corre- similar (pseudo) events Q:
sponds to the simple fact that higher powers of S are in- P
Q = Q P
[ ] . 4.30
creasingly more sensitive to its variations. So the expression jet algorithm
4.24 suggests to replace 4.25 with a sum similar to the follow- Q usually has many fewer (pseudo) particles than P.
ing one:s Recall that partonic events p have the same formal nature
~ ~ ~
f P = J P + 2J P + 3J P +K as the hadronic events P. This implies, first, that Q is an object
quasi ( ) [ ] [ ] [ ]
3 4 5 4.26
of the same nature as P and p; second, that the mapping 4.30
Of course, the series cannot contain more terms than the num- is defined on both hadronic and partonic events.
ber of theoretically known corrections to pQCD. We will call Q jet configurations and their pseudoparticles,
Actually, any conventional shape observable that vanishes jets. For clarity's sake, we distinguish jets the mathematical
(only) on 2-particle configurations meets the above require- objects (the pseudoparticles of Q) from jets the collections of
ment. For instance, one such shape observable is the combina- particles (hadrons or partons) in which case we will use the
tion 1 - T , where T is the so-called thrust (eq.(46) in [1] and terms spray or cluster, usually in informal reasoning.
Jets in Q will be labeled by the index j, and the j-th jet is
characterized similarly to particles of the event P (cf. 3.6), i.e.

s by its energy and direction denoted as E and $
q :
Recall that jet-number discriminators are normalized so that their maxi- j j
mal value reached on configurations with no less than m widely separated
particles, is equal to 1.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 20 of 45



Q = E , $
q
o t
j j . 4.31 b Q[P] has K jetsg bP P g. 4.36
j =1 K
K N (Q)

In practice both particles and jets are endowed with additional (The -function is defined in 2.39.)
attributes, e.g. Lorentz 4-momenta, and the jet algorithm The value of K is chosen to enhance sensitivity to M and to
evaluates them along the way, but at this point we ignore such suppress backgrounds. It is usually determined using the ap-
complications. proximate relation jets partons (Eq. 4.32). Then K is the
That the mapping 4.30 is supposed to be an approximate in- number of partons in the final states in the lowest order of the
version of the hadronization described by the kernels H (n) in QCD perturbation theory in which the dependence on the pa-
Eq. 4.12, means that Q should be close to p. This can be repre- rameters one is interested in is manifest.
sented as Observables 4.37

Q p . 4.32 The last element of the conventional approach is a function
defined on jet configurations Q which passed the jet-number
The exact meaning of the approximate equality is yet to be cut (Sec. 4.34); denote it ad hoc(Q) .
specified, and it may be impossible to identify a single partonic In practice (Q) is chosen in ad hoc fashion although once
configuration which hadronized into P. Still, there is a class of the jet algorithm is chosen then it is possible in principle to
events for which the relation 4.32 is unambiguous (at least in construct optimal observables for the probability distribution
the asymptotic limit of high energies), and this provides a mapped to Q (Sec. 4.52).
minimal requirement which any jet algorithm must satisfy and The observable on events P is then defined as follows:
which serves as a sort of boundary condition for jet definition
which we present for explicitness' sake: P
Q
Q
P
j.a. j. number cut f ( ) , 4.38

b g
For events which consist of a few energetic well isolated narrow where f ( )
P = [
Q ]
P ha K
s jets (Q[ ]
P ) .
sprays of particles, each spray is associated with a jet whose energy The data processing scheme 2.5 becomes
and direction coincide with those of the spray.
4.33 ( )
P
(Q)
f U
j.a. + cut th V|
, M ,K 4.39
{P } S W
i i
{
Q } fit
i i
f
The ambiguity of jet definition concerns how jet algorithms j.a. + cut exp W|
handle fuzzy events that do not fall into the above category.
It is quite obvious that the optimal observable 4.20 cannot
Another important condition usually imposed on jet algo- be represented in the form 4.38 with any non-trivial jet algo-
rithms is that the mapping 4.30 should be fragmentation in- rithm in realistic situations. This means that with such observ-
variant. In the context of our theory this is essentially superflu- ables it is impossible to achieve the theoretical Rao-Cramer
ous since the interpretation of events and functions on them limit on the precision of determination of fundamental pa-
modulo C-continuity (which incorporates fragmentation invari- rameters. We will come back to this in Sec. 4.43.
ance; see 3.34) is built into our formalism at a linguistic level:
If all the arguments are expressed in the language of 3.12 Examples 4.40
rather than the particle representation 3.6 then the resulting jet
definition will be automatically fragmentation invariant. Two typical examples are as follows.
Note that any reasonable jet algorithm sets, explicitly or The first example is the so-called 3-jet fraction in the proc-
+ -
implicitly, a lower limit on the angular distances between jets ess e e hadrons which used to be one of the observables
in Q. The limit may depend on jets' energies. employed for measurements of S at LEP1. Here one simply
has:
A related observation is that the mapping 4.30 cannot
(unless it is trivial, i.e. Q [P] = P) be continuous in any non- f = b g
3 jets ( )
P h
Q as 3 jets . 4.41
pathological sense for some P. The points of discontinuity usu-
ally correspond to the events whose different small deforma- The second example is a simplified (but sufficient for the
tions result in jet configurations with different numbers of jets. purposes of illustration) version of what might be used at LEP2
+ - + -
to measure the mass of W in the process e e W W
The jet-number cut 4.34 + -
hadrons above the W W threshold where each W decays
Another element of the conventional data processing into two jets. Here one would select events with 4 jets and
scheme is the so-called jet-number cut, which is a selection choose (Q) to yield an array of numbers, each being the
procedure (similar to any other event selection procedure; see number of jet pairs from Q whose invariant mass falls into the
Sec. 2.35) based on the number of jets the chosen jet algorithm corresponding interval of the mass axis (bin):
finds. f P = b Q g
dijets( ) has 4 jets
It is convenient to introduce a notation for the collection of
events with a given number of jets (the K-jet sector):  no. of dijets from
Q in th m
e -th bin . 4.42
m = .
1 ..N bins
P = P
l P
: Q
P
[ ] has K jetsq .
K 4.35
Understanding the observables 4.38 4.43
Then the space of events P is sliced into a sum of P K for dif-
ferent K . The exact shapes of P Substitute f (P) defined by 4.38 into the l.h.s. of 4.2 and use
K depend on the chosen jet al- 4.12. Simple formal changes of the order of integrations yield:
gorithm.

The jet-number cut is equivalent to inclusion into observ- dP (P) f P
( ) dq
(q) (q
z z ) , 4.44
h
q as K jets
ables of a dichotomic factor of the form


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 21 of 45



where an almost collinear gluon, etc. Then for some events one must
rely on a convention about whether such an event is a had-
( ) = d
z (n) ( ) (n)
q p p h (p, q)
pQCD , 4.45 ronized LO quark, or a hadronized NLO configuration of the
same quark and a gluon. We will come back to this point in
with the kernel h(n) given by Sec. 4.70.

h(n) p q = P H(n)
( , ) ( p, P) (q, Q[ ]
P )
zd . 4.46 Lastly, from a computational viewpoint, inversion of a con-
volution like 4.12 is in general an ill-posed problem. This
The -function on the r.h.s. is similar to the one in 4.16. means that even if a solution formally exists, numerical insta-
If one drops the jet-number cut from the definition 4.38 then bilities may be encountered in practice. In the present case,
the only change to be made is to drop the restriction on q in the such instabilities occur near the discontinuity of the mapping
integral on the r.h.s. of 4.44. P Q, as already discussed.
Note that Eq. 4.45 differs from 4.12 by the replacements Understanding h(n)(p, q) 4.49
H (n) h (n), P q .
If the mapping P Q corresponds to a typical jet algorithm The importance of the kernel h(n)(p, q) is due to the fact that
then the domain of q is, generally speaking, the same as for P, it characterizes the combined effect of the chosen jet algorithm
i.e. P (Sec. 3.15). However, most of the probability density and the hadronization mechanism represented by H (n).
*(q) is now concentrated on pseudoevents q with fewer parti- h(n)(p, q) may be non-zero even if the numbers of particles
cles than was the case with (P). Second, *(q) is zero on jet in p and q do not coincide (two close partons from p may had-
configurations with some pairs of jets sufficiently close (the ronize into overlapping sprays of hadrons which the jet algo-
corresponding events P are then mapped to jet configurations rithm maps into a single jet). This motivates introduction of the
with a single jet instead of such a pair; recall the comments following quantities. Define
after 4.33). h(n) p K = q h(n)
( , ) ( ,
p q) .
z d 4.50
The kernel h(n)(p, q) is interpreted as the probability for the q has K jets
partonic event p to generate any hadronic event that would This is interpreted as the probability for the partonic event p to
yield the jet configuration q after application of the jet algo- hadronize into hadronic events recognized by the jet algorithm
rithm. as having K jets. Then the fraction of L -parton events which
On inversion of hadronization 4.47 hadronized into K -jet events is formally given by

(n) (n)
Eq. 4.45 means that the kernel z
h(n)(p, q) effects a smearing dp ( )
p h ( ,
p K)
pQCD
p ha L
= s partons
of the perturbative expression. If the complete (P) given by h ( L, K) . 4.51
(n)
dp ( )
p dq h(n) ( ,
p q)
z pQCD
p has partons z
4.12 is strictly non-negative then such must also be *(q) . L

The latter fact has the following consequence: (The integral over q in the denominator yields 1 as is seen
from the definition 4.46 and the normalization 4.13.)
(n)
Since the pQCD probability density p
pQCD ( ) is not strictly The quantity h (K, K ) is the fraction of events P generated
non-negative near some p, the non-negativity of its smeared analog from partonic events with K partons and recognized by the al-
gorithm as having K jets. The quantities h(n)(p, q), h(n)( p, K )
*(q) implies that an exact inversion of hadronization is impossible
with any jet algorithm in the form of the mapping 4.30. and h( L, K ) give a more differential information. They can in
4.48 principle be studied numerically using Monte Carlo event gen-
erators. In particular, it is interesting to compare the spread of
This impossibility can be quantified; see Sec. 5.10. q around p for a few typical p and for different jet algorithms.
Furthermore, the hadronization kernel H (n) depends on n, It might be useful (certainly interesting) to have a reasona-
the order of pQCD corrections included into the perturbative bly detailed empirical information of the kernels h(n)(p, q),
probabilities (n) p h(n)( p, K ), etc.
pQCD ( ) in 4.12. It is not clear which n the in-

version of hadronization should be geared to. Optimal observables in the class 4.38 4.52

For instance, consider radiation of a gluon by a quark. If n
corresponds to the leading order (LO) approximation then the From a general mathematical viewpoint the smearing 4.45
mechanism of gluon radiation is described by the hadronization can be regarded as an example of a regularization of a singular
kernel H (n). If n corresponds to the next-to-leading order approximation (Sec. 2.51; i.e. the pQCD approximation
(n)
(NLO) then (n) p ( )
p of the exact probability density (P)), transforming
pQCD ( ) is a sum of LO and NLO terms, and pQCD
then H (n) should contain contributions which dress the LO and it into a physically meaningful form. This implies that whereas
NLO terms. This in fact is a different aspect of the same prob- the perturbative expression 4.21 is formal, it is entirely mean-
lem 4.48: Jets have to be defined at the level of perturbative ingful to construct an optimal observable defined on q from
quarks and gluons before a connection with observed data can *(q) according to the standard recipe 2.17:
be established. q = q
opt ( ) ln ( ) .
M 4.53
Still another aspect of the same problem is in terms of non-
uniqueness of inversion of hadronization. In general, different In terms of events P, the observable 4.53 is
configurations of partons may result in the same hadronic
event. This is seen e.g. from the collective nature of hadroni-
zation (a single colored parton cannot develop into a jet of col-
orless hadrons). It is even more true if partonic cross sections
are evaluated in NLO approximation where a quark can radiate


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 22 of 45




(2) The observable $
f given by 4.57, which yields the
$ opt, K
f = b g
opt ( )
P [
Q ]
opt P
best precision among observables of the form 4.60, i.e. defined
(n)
dp ( ) (n)
p  h ( ,
p [
Q ]
P )
z M pQCD via intermediacy of the chosen jet algorithm in the K -jet sector.
= . 4.54
( n)
d '
p ( '
p ) (n)
 h ( '
p , [
Q ]
P )
z pQCD (3) The observable $
fopt,K,K1 defined by inclusion of the

adjacent jet-number sectors (Sec. 4.58; one could include only
The kernel h (n) is given by 4.46. one of the two adjacent sectors.)
If the dimensionality of q for which 4.54 is non-negligible is (4) The ideal observable $fopt 4.54 which yields the best pre-
not too high then a brute force construction of a numerical in- cision among observables defined via intermediacy of a jet al-
terpolation formula to represent 4.54 might be feasible. gorithm in all jet-number sectors.
The formula 4.54 is valid for any jet algorithm (cone, k T, (5) The ideal observable f opt (4.20) defined without jet algo-
etc.), and it describes a way to achieve the theoretically best rithms. It yields the absolutely best (Rao-Cramer) precision for
precision for the parameter M with a given jet algorithm the parameter M .
within the conventional scheme 4.38. Of course, the functions
defined by 4.54 differ for different jet algorithms. The observables are listed in increasing informativeness:
Quite obviously, each additional restriction on the form of ob-
It is easy to take into account the jet-number cut. Then all servables is an extra obstacle for achieving the Rao-Cramer
one has to do is restrict considerations to the K -jet sector: limit of precision.
( )
q -
( )
q Z 1
= ( )
q b q has K jets g ,
K K 4.55 Furthermore, it is clear that one can, at least in principle,
construct quasi-optimal observables (Sec. 2.25) for any of the
where Z K is an appropriate normalization factor (it may de- observables $
f and $
f
pend on M ). Then Eq. 4.53 is modified as follows: opt, K opt, K ,K 1 .

The following figure illustrates the relation between the
( )
q = ln ( )
q  b q K g
opt, has jets
K M K various observables which we discuss:

= ( )
q - ln Z  b q K g
opt has jets .
M K 4.56 fopt the Rao - Cramer limit

Note that in practical constructions the subtracted term in $
square brackets may be dropped (the comment after 2.18). fopt

The corresponding observable defined on events P is essne $
f r e g
$ opt, K , K 1
f ( )
P = b [
Q ]
P g tiv $
opt, K opt, K a fquasi,K,K1
rmfo
$
= f ( )
P - ln Z  b [
Q ]
P K g .
opt has jets
M K 4.57 $ reg
r in $ f
e f quasi, K
h opt, K r e g r e g
ig $ $ $ reg
This construction remains valid for any K , i.e. one can con- h fquasi,K f f
ad hoc, K, K 1 ad hoc, K
struct an optimal observable in any jet-number sector.
Of course, usually one sector (which corresponds to the r e g

"canonical" value of K ; see the remarks after 4.36) would yield $
f
a more informative observable than others. ad hoc, K
4.61
Inclusion of adjacent jet-number sectors 4.58
Hats denote observables defined via intermediacy of a jet
There is nothing to prevent inclusion into consideration in algorithm (remember that the reasoning in this section is valid
4.554.57 of additional jet-number sectors. Quite obviously, for any fixed jet algorithm). Arrows indicate an increase in in-
this would increase informativeness of the resulting aggregate formativeness (neither absolute nor relative magnitudes of the
observable. If the K -jet sector was the most informative one increase can be predicted a priori). The absence of arrows be-
then it is natural first to include one or both adjacent sectors tween two observables means their informativeness cannot be
which correspond to (K  1) jets. compared a priori (except for the case of f opt which has the ab-
solutely highest informativeness).
Comparison of different classes The "reg" arrows correspond to the option of regularization
of observables 4.59 of cuts which will be discussed separately (Sec. 4.68 and
Section 9).
In the following discussion we assume that a jet algorithm
is fixed (unless indicated otherwise). Ways to increase informativeness
We can compare different kinds of observables for meas- of ad hoc observables 4.62
urements of a fundamental parameter M : Consider a conventional ad hoc observable $fad hoc,K
(1) A conventional observable of the form (Eq. 4.60). There are at least the following ways to improve it
$
f ( )
P = b [
Q ]
P has K jets g  b [
Q ]
P g , (cf. Fig. 4.61):
ad hoc, K ad hoc, K 4.60
(i) Replacing the ad hoc observable with a quasi-optimal one
which involves a jet-number cut and an ad hoc function (the prescriptions of Sec. 2.25).
ad hoc, K (q) usually defined on jet configurations with K jets
only, as described by the -function in 4.60. (ii) Inclusion of the adjacent (K  1)-jet sectors (Sec. 4.63).


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 23 of 45



(iii) Regularization of discontinuities (Sections 4.68 and 9). 1) The variation of the expression 4.20 over the collections of
(iv) Adjustment of the underlying jet algorithm (to be dis- events P which correspond to the same jet configuration q .
cussed in Sec. 5.3). Each such collection is described by the equation q = Q[P].

These options may be combined. In subsequent subsections 2) The discontinuities of 4.38 at the boundary of the regions
we will discuss them in more detail together with some related P K (defined in 4.35).
issues.
Concerning the first source, the guidance here is provided
Inclusion of additional jet-number sectors 4.63 by the general criteria 2.26, 2.27 with f = f$
quasi opt . One can

If, say, the K -jet sector was the most informative one make the following simple observation:
(usually because the lowest PT order where the dependence on
the desired parameter first manifests itself corresponds to final The faster the variation of f opt near some P, the more fine-
states with K partons) then it is natural to first include one or grained should be the mapping P Q there.
more of the adjacent, (K  1)-jet sectors. 4.65
The best precision is obtained if one uses a quasi-optimal
observable in each sector, otherwise an increase of precision is Non-optimality due to discontinuities 4.66
not guaranteed.

The simplest way to include information from additional The second source of non-optimality is due to disconti-
jet-number sectors is to map events from each additional sector nuities at the boundaries of P K . Fig. 4.67 gives an illustration
into one point (the scheme 4.444.46, 4.53 is valid irrespective of what happens near such a boundary.
of the physical or mathematical nature of the mapping P Q).
Then it is sufficient to determine the value of the correspond- fopt
$
ing optimal observable at that point (this means that all events $
f f
opt ad hoc,K
from this sector receive the same weight). The magnitude of
the resulting increase of informativeness could be regarded as a
signal of whether or not a more detailed treatment might be P K P K+1 P K P K+1
warranted. Such a procedure might be a useful way to control 4.67
the loss of information due to the restriction of the jet-number
cut. The left figure shows $fopt against f opt . It is assumed that
Inclusion of additional jet-number sectors seems to become the latter is small outside the K -jet sector. The shaded areas
useful whenever the quantity 1 - h (K, K) (Eq. 4.51 and the corresponds to the non-optimality of $fopt (recall the criterion
comments thereafter) is appreciably non-zero. The difficulty
here would be if adding even one jet increases the dimension- 2.27 and the rule of thumb 2.29). $
fopt, K differs from $fopt by
ality of phase space too much to allow a meaningful construc- being equal to zero outside P K (apart from an inessential ad-
tion of observables following 4.53. Then one may be satisfied ditive constant). The right figure shows an ad hoc variable
with defining reasonable ad hoc observables in the adjacent against f
jet-number sectors. In such a situation one may find inspiration opt . If the variable where the K -jet fraction, it would
be constant in P
e.g. in the constructions of [4] such as spectral discriminators. K .
Computation of spectral discriminators may be prohibitively The problem is exacerbated if the boundary of P K passes
expensive for raw hadronic events but some similar observ- through the region of a fast variation of f opt . Note e.g. that the
ables defined on jet configurations with, say, no more than 10 probability density (from which f opt is constructed) in QCD
jets should not be difficult to compute. varies by an order of magnitude between the regions corre-
For instance, in the context of example 4.42, one could in- sponding to K and K + 1 jets because radiation of an additional
clude into consideration the 5-jet sector and define a similar jet is accompanied by the factor S ~ 0.1.
observable by allowing both di- and tri-jets (an additional jet It is clear from Fig. 4.67 that forcing the observable to con-
may have been radiated from one of the partons originally tinuously interpolate between its different branches
forming a dijet). And/or one could include the 3-jet sector and (represented by fat lines) would eat away at the non-optimality
define an observable in it based on the fact that some pairs of (the shaded area) and thus increase precision of determination
partons may generate overlapping jets which may be seen by of the parameter M .
the jet algorithm as a single jet (e.g. the invariant mass distri-
bution of single jets). The relevant notion of continuity (among the many possible
ones in an infinitely-dimensional space of events P ) is the C-
Sources of non-optimality of the observable 4.54 4.64 continuity discussed at length in Sec. 3.18. (However, remem-
With a fixed jet algorithm, a conventional ad hoc observable ber 2.50.)
4.60 can always be improved, in principle, by a transition to Next we consider an example which shows that elimination
the optimal observable 4.57, and by inclusion of all jet-number of the discontinuities that are typical of the conventional ob-
sectors (the combined effect of both tricks is represented by servables may result in a noticeable improvement of precision
4.54). So the truly fundamental limitations of the conventional of measurements.
scheme are those associated with the sources of non-optimality
(i.e. loss of precision of the extracted values for M ) of the ob-
servable 4.54 compared with the ideal expression 4.20.

There are two sources of such non-optimality in the jets-
mediated optimal observable 4.54 compared with 4.20:


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 24 of 45



The role of continuity (shape observables vs. weights, together with prescriptions for regularization of the
3-jet fraction in measurements of S ) 4.68 jet-number discontinuities.

The effect of non-optimality due to the jet-number cut is
seen in the precision measurements of S at LEP. Here one Definitions of jets 5
usually employs observables f such that
f = O( The use of jet algorithms in data processing following the
S ) . 4.69
scheme 4.39 is motivated by the specifics of QCD dynamics.
One example is the shape observable thrust defined by 4.27. The arguments of Sections 24 provide a framework to discuss
Another example of an observable which satisfies 4.69 is the 3- jet definitions. A jet algorithm is a tool for construction of ob-
jet fraction 4.41. Since the latter is a discontinuous observable servables for specific precision measurement applications
of the conventional kind (4.60) whereas the former is continu- (Sec. 4.28), and the resulting observables can be compared us-
ous (even C-continuous; Sec. 3.18) and smoothly interpolates ing the notion of informativeness of observables (2.24). The
between different jet-number sectors, it is interesting to com- usefulness of jet algorithms is due to the dynamics of pQCD
pare them in regard of the quality of results they yield. (Sec. 4.1) which allows one to regard hadronic events as simi-
We have seen (Sec. 4.22) that shape observables such as the lar to their hard partonic parents (Eq. 4.16). This justifies the
thrust are nearly optimal for measurements of point of view that jet algorithms effect an (approximate) inver-
S (which is
quite obvious already from 4.69). On the other hand, the sion of hadronization (Eq. 4.32).
boundary between the 3-jet and 2-jet regions is located where First in Sec. 5.1 we discuss the conventional criterion used
the probability density and f opt vary fast: as was pointed out in to compare jet algorithms. Then in Sec. 5.6 we introduce a jet
[3], 1% of 2-jet events incorrectly interpreted due to detector definition based on the informational abstractions of Section 3
errors and statistical fluctuations in hadronization as having 3 (the identification 3.43). The explicit purpose of such a jet
jets induces a 10% error in the 3-jet fraction because the corre- definition is to serve as a tool for a systematic construction of
sponding probabilities differ by a factor of O( S ). quasi-optimal observables defined on hadronic states. In
Note that 1 - T (P) smoothly interpolates between the points Sec. 5.10 we examine how the dynamical considerations com-
in phase space where it takes its minimal and maximal values plement the picture.
(0 and 1). This should be contrasted with the discontinuities of The conventional approach to jet algorithms 5.1
the 3-jet fraction 4.41.

Another kinematic property of the shape observables such A common way to judge suitability of a jet algorithm to a
as the thrust is that they are rather simple energy correlators particular application (a precision measurement of a funda-
and thus fit into the structure of quantum field theory. This mental parameter M ; Sec. 2.1) is as follows. One chooses K as
property ensures their superb amenability to theoretical inves- described after Eq. 4.36 and evaluates the fraction of events
tigations such as the sophisticated higher order calculations for generated from partonic events with K partons and recognized
the thrust reviewed e.g. in [20]. by the jet algorithm as having K jets. This fraction is formally
By now it has been accepted that the S measurements (at given by h ( K, K ) defined by Eq. 4.51. (Note that
least of the LEP type) are done best via shape observables 0< h(K,K)< 1.) The larger this fraction, the better the jet algo-
rather than the 3-jet fraction.t rithm is deemed to be.

The boundaries of P K and non-uniqueness of This criterion amounts to an implicit definition of an ideal jet algo-
inversion of hadronization 4.70 rithm as the one which maximizes h ( K, K ). The various jet algo-
rithms are then regarded as candidate approximations constructed
We conclude that the discontinuities at the boundaries of empirically.
different jet-number sectors in the space of observed events 5.2
may be a major source of non-optimality of conventional ob-
servables. The events near the discontinuities have two or more This definition can be related to the notion of optimal ob-
jets that are hard to resolve reliably. servables as follows. Consider the example 4.42. Then at the
This has a simple physical interpretation in terms of non- level of partons, the optimal observable is entirely localized on
uniqueness of inversion of hadronization: There is no way to 4-parton events. At the level of hadrons, the optimal observ-
tell whether an event with overlapping jets was generated by K able f opt is mostly localized in the 4-jet sector P 4. If it were a
hard partons dressed by a hadronizing QCD radiation, or by constant there then it is entirely specified by P 4, and the con-
K + 1 hard partons with two of them close enough to make the ventional criterion simply attempts to find the shape of P 4.
resulting jets overlap. Note that there can be many parameters one may wish to
So, in general, there may be more than one candidate par- measure and so many different f opt . Since their shapes are all
tonic events that can be regarded as parents for a given had- different, focusing only on the shape of P 4 is a convenient
ronic event. The best one can do is provide weights for each compromise.
such candidate; the weight reflects the expected probability for
the hadronic event to have been generated from a particular The advantage of the conventional criterion 5.2 is its sim-
partonic candidate. plicity and naturalness.
In Section 9 we will discuss ways to assign to the same The disadvantages are as follows:
event several different jet configurations with suitably chosen (i) Beyond the leading PT order, the signal is non-zero in
other jet-number sectors.
(ii) f opt is not piecewise constant.
t A large table presented in the lecture [26] did not contain a line for the 3-
jet fraction which used to be a standard feature of such tables. In response (iii) The criterion is based on a convention which although
to a query, the speaker mentioned unsatisfactory experimental errors. plausible is not based on a precise argument (see however the


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 25 of 45



reasoning in Sec. 5.3). We conclude that a desirable property of a good jet algo-
(iv) There is no clear way to improve upon the conventional rithm is to provide options for a systematic improvement upon
scheme 4.38 should one find the leading order PT arguments the conventional scheme 4.38. The jet algorithm we derive
insufficient. below offers such options.

The concept of optimal observables allows one to be a little The optimal jet definition. Qualitative aspects 5.6
more precise:

Improving upon the conventional jet definition 5.3 The jet definition we are going to introduce deserves to be
called optimal for two reasons:
Within the limitations of the conventional scheme 4.38, the T HE KI N E M AT I C AL R E AS ON : It involves an optimization
best precision for M is achieved with the observable $
fopt, K that has a well-defined meaning in terms of information
(Eq. 4.57) which is entirely determined once the jet algorithm content of events and the corresponding jet configurations
is fixed. So it is legitimate to ask which jet algorithm maxi- (Sec. 5.7).

mizes the informativeness of $
f T HE D Y N AM I C AL R E AS ON : It possesses a property natu-
opt, K . The definition is mean-
rally interpreted as an optimal inversion of hadronization
ingful because the informativeness of $
fopt, K is given by the (Sec. 5.10).
following integral:  The two properties are logically independent (at least I
don't see a formal connection), and both lead to exactly the
dq (q) b q has K jets g 2 (q
z   ) .
K opt, K 5.4 same definition 5.9. The only common element is the formal
The best jet algorithm would then maximize this. language in which both are phrased (the language of general-
ized [C-continuous] shape observables, Sec. 3).
It is interesting to find a way to connect this with the con-
ventional criterion 5.2. Suppose one aims at a universal jet The equivalence of the two approaches came about as a
definition, then it is natural to replace the last factor by a con- complete surprise. The order of presentation is determined by
stant. Then recall Eqs. 4.55 and 4.45. In the latter, restrict the historical reasons.
integration to K -parton events. The resulting integral coin- Informational definition 5.7
cides, up to a normalization, with the numerator of h ( K, K ); cf.
Eq. 4.51. From the most general point of view, the jet algorithm, op-
Unfortunately, it is not clear how to derive from this a spe- erationally, is a data processing tool whose purpose is to fa-
cific jet algorithm. cilitate extraction of physical information. The resulting sim-
plifications come at a price -- a loss of information in the tran-
Furthermore, the conventional framework 4.38 per se im- sition from events to jet configurations. The most basic and
poses a restriction on attainable precision for fundamental pa- general requirement for any data processing tool -- jet algo-
rameters. If one seeks to alleviate it by, say, an inclusion of the rithms not excluded -- is that the distortions it induces in the
adjacent jet-number sectors then the best jet algorithm should physical information should be minimized.
minimize the informativeness of $
fopt,K,K1 rather than $fopt, K . So it is natural to require that the best algorithm should
Furthermore: minimize such an information loss:

It is not clear whether or not imperfections of the conventional jet
algorithms are more important than the intrinsic limitations of the The jet configuration Q[P] must inherit maximum information
conventional scheme 4.38 as a whole. from the original event P.
5.5 5.8


The answer probably depends on the problem. A priori one This in fact is similar to the conventional criterion 5.2 but
cannot exclude that for some applications, an improvement of now we would like to be more systematic in regard of inter-
the scheme 4.38 as a whole via relaxation of the jet-number cut pretation of the information loss. To this end we will rely on
in the spirit of regularizations of Sec. 2.52 could be more im- the kinematical analysis of Sec. 3.
portant than improvements of the jet algorithm only. Note that the criterion 5.8 is applicable both to experimen-
A relaxation of the jet-number cut implies an inclusion into tally observed hadronic events and to theoretical multiparton
consideration of events with at least the adjacent numbers of events in situations where radiative QCD corrections need to
jets, K  1. be taken into account.
The example of Sec. 4.68 lends credibility to this point of The analysis performed in Section 3 led us to the identifi-
view: the transition from 3-jet fractions to shape observables in cation 3.43. This immediately allows us to translate the crite-
the measurements of S can be regarded as a trick to take into rion 5.8 into the following form:
account events from all jet-number sectors. This example is
special in that jet algorithms can be avoided altogether in the f (P) f (Q[ ]
P ) for any basic shape observable f . 5.9
improved observables. In general such luck may not occur, so
jet algorithms are bound to remain a part of the answer.
But if one includes into consideration events with "wrong" The less the discrepancy between the left and right hand sides,
numbers of jets and/or finds a way to regularize the disconti- the more information from P is inherited by the jet configura-
nuities at the boundaries between different jet-number sectors tion Q[P].
in order to make the resulting observable continuous at those The definition requires comments.
boundaries, then the details of how the space of events is sliced
into jet-number sectors may become less important. (i) The exact equality can always be achieved in 5.9 for
Q[P] = P, so for the replacement P Q to make sense, one


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 26 of 45



requires, heuristically speaking, that Q should have fewer jets (n)
- z
than P has particles. Thanks to the C-continuity of the partici- f (p) P
d H ( ,
p )
P f ( [
Q ]
P )

pating observables f , this can be achieved via two mecha- f ( )
p - dP H (n) (p, )
P f ( )
P
z
nisms: a replacement of sufficiently narrow sprays of particles
by single jets (pseudoparticles from Q), and dropping particles + dP H(n)(p, )
P f ( )
P - f ( [
Q ]
P ) ,
z 5.13
which carry sufficiently small fractions of event's total energy
(the so-called soft particles). where f (P) was subtracted from and added to f (Q[P]).
(ii) The replacement of a narrow spray of particles by one The first line on the r.h.s.,
pseudoparticle implies that the detailed structure of the events
at small correlation angles is less important than its structure f - H(n
( ) )
p P (p, )
P f ( )
P
z d , 5.14
grosso modo (what is called "topology of jets"). This makes
natural the eventual occurrence in the jet definition of a pa- is independent of the jet algorithm. Its smallness is described
rameter interpreted as the maximal jet radius (R ). On the other by 4.15. The subtracted term is the average value of f on had-
hand, different f in 5.9 are differently sensitive to replace- ronic events generated by the partonic event p. We can draw
ments of sprays of particles with a single jet. The induced error the first conclusion:
will be greater for the observables whose angular functions
(see 3.36) vary faster. The angular resolution parameter R will The contribution 5.14 sets a jet definition-independent limit on
then control the subclass of f for which the error is minimized how well hadronization can be inverted.
5.15
(Sec. 6.25).

(iii) The criterion 5.9 is formulated for individual events, and A consequence is that measuring quality of jet algorithms by
the error may also depend on P, so that the approximate percentage of restored parton events is meaningless beyond a
equality must hold in some integral sense (Sec. 5.21). This is certain limit. To go beyond that limit, it is necessary to go be-
where dynamical considerations may enter into the picture yond the restrictions of the scheme 4.38.
(Sec. 5.31).
The dependence on the jet algorithm only appears in the
(iv) With 5.9, the jet algorithm can be interpreted as a trick second line on the r.h.s. of 5.13. Therefore, to minimize 5.13
for approximate evaluation of (or for construction of approxi- (and so the error in 5.11) it is sufficient to minimize the fol-
mations for) complicated C-continuous observables such as the lowing expression:
optimal observables 4.20. The trick is unusual in that here one
simplifies the arguments, whereas normally one would sim- z dPH(n)(p,P) f P()- f (Q P[]) . 5.16
plify the expression of the function to be computed.
Such a minimization has to be accomplished for any p but
(v) It is clear that the optimal jet configuration for a given since the jet algorithm cannot depend on the unknown p, the
event need not be defined uniquely (more than one jet configu- only meaningful general option is to minimize the expression
rations may ensure 5.9 with a comparable error). Physically,
this corresponds to the fact that different hard parton events in square brackets for each P, and we come back to the crite-
may hadronize into the same hadronic event. This is an impor- rion 5.9.
tant option completely missing from conventional discussions. An interesting property is that for a fixed P, the obtained
We will come back to this in Sec. 9.1. criterion is independent of the hadronization kernel H (n), i.e. of
any dynamical information. This conclusion holds indepen-
(vi) A definition such as Eq. 5.9 would be genuinely useful dently of n , the order of pQCD corrections included into the
only if one could control the approximation error via an esti- parton-level probabilities.
mate which would be both simple and precise. The general
form of such estimates is discussed in Sec. 5.17.  Dynamical information, however, may affect one's decisions
about the allowed error for different P. We will turn to this in
Inversion of hadronization 5.10 Sec. 5.31.

A remarkable fact is that the same criterion 5.9 also ensures A quantitative definition 5.17
what can be described as an optimal inversion of hadroniza-
tion. Our analysis of the qualitative definition 5.9 is based on
How well a given jet algorithm inverts hadronization is inequalities of the following factorized form to be obtained in
measured by how well the kernel 4.46 is approximated by the Section 6:
-function:
f ( )
P - f ( )
Q < C [ ,
P Q] , 5.18
h(n) ( ,
p q) ( ,
p q). f
5.11

The only way to interpret this is via integrals with C- where the constant C f is independent of P and Q = Q[P],
continuous functions (cf. 3.43). whereas the expression [P, Q] is independent of f .
So, integrate both sides with an arbitrary C-continuous Existence of a factorized estimate 5.18 could not have been
function f (q). For the r.h.s. we obtain f (p). For the l.h.s., use postulated a priori. Another surprise is that turns out to be
the definition 4.46 and obtain an infrared-safe shape observable of a conventional kind and,
z moreover, closely related to the thrust (see Sec. 8.11).
dP h(n) p q f q = z dP H(n)
( , ) ( ) p
( , P) f (Q P
[ ]) . 5.12 An estimate of the form 5.18 would be sufficient for defin-
Then consider the resulting difference: ing jet configurations in such a way as to control the errors in-
duced in the observables via 5.24. The simplest option is to
specify a small positive cut (which in general may be chosen


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 27 of 45



differently for different events P) and then define Q by de- expected to have discontinuities in any derivatives, so they fall
manding that it ensures that into this class.

With the bound 5.18 valid for the optimal observables, the
[ ,
P Q] ( )
P
cut . 5.19 jet algorithm based on it can be regarded as a trick for ap-
proximate computation of -- or, equivalently, constructing ap-
Since the purpose of replacing P by Q is to simplify calcula- proximations for -- such observables. This allows one to com-
tions, one would seek to satisfy the restriction 5.19 with a pare different jet definitions on the basis of the magnitude of
minimal number of jets in Q. On the other hand, in order to the errors they induce in the relation 5.9 (more precisely, one
minimize the actual error induced in the transition to jets, one looks at 5.18).
would seek to minimize [P, Q]. To summarize: We will use the term optimal and its derivatives in connec-
tion with various jet definitions in the following sense:

An optimal configuration of jets Qopt for a given event P mini- A jet finding prescription A is less optimal than another
mizes [P,Q] while satisfying the restriction 5.19 with a minimal prescription B if with a given number of jets (which is a meas-
number of jets. ure of computational economy) the jet configurations produced
5.20 by A inherit less information from the original event than is the
case with B. In other words, the use of the scheme A makes it
See Sec. 9.1 for a discussion of important implications of the computationally harder compared with B to approximate opti-
fact that the optimal jet configuration on which the minimum mal observables 4.20 and thus to achieve the best possible pre-
of [P,Q] is reached is, in general, not unique. cision for fundamental parameters such as S , M W, etc.
(It is possible to make this more precise via inequalities for
Errors induced in observed numbers 5.21 different by analogy with the standard techniques for com-
In the final respect, the observed physical value is f and parison of norms in vector spaces. We skip this exercise be-
cause the conventional algorithms cannot be easily represented
not f (P). So, we must study how the errors induced in f (P) for in the spirit of 5.18.)
each P propagate to the level of f . We will use this notion in Section 10 to compare the jet
To this end, recall Eqs. 2.32.4. The replacement of events definition we will derive with the conventional algorithms.
P by the corresponding jet configurations Q results in the fol-  An obvious conclusion from the above reasoning is that the
lowing expressions: estimate 5.18 should be as precise as possible, i.e. its con-
struction should not involve tricks which would overestimate
f = d ( ) f
z P P (Q) , 5.22
th, jets the error. This would ensure optimality of the resulting jet
1 definition. We will pay heed to this in Section 6.
f = f
(Q ) , 5.23
exp, jets N i i  To avoid confusion, note that the optimality of jet algo-
rithms is a different (although metaphysically related) thing
where Q = Q opt is a function of P as defined by 5.20, so that from the optimality of observables (Sec. 2.25), in particular
Q i is its value on Pi . Using the bound 5.18, one obtains from the optimality of observables within the restrictions of the
scheme 4.38 with a fixed jet algorithm.
f - f C
f , 5.24
th, jets The universal jet definition 5.27

where The simplest universal option is to choose (
cut P) to be in-
dependent of the event P:
= dP ( )
P  P
[ ,Q] dP ( )
P 
z z cut P
( ) . 5.25

P
b g = =
cut cut const . 5.28
This expression controls the errors inherited by all interpreted
physical information (M W, etc.) extracted via jets according to Then because the probability distribution is normalized to 1,
5.225.23.
 The quantity together with its fluctuations can be esti- dP ( )
P 1
z , 5.29
mated like any other observable as the mean value and vari-
ance of [P, Q] which is computed for each event P in the Eq. 5.24 would be ensured with some cut. So:
process of minimization according to 5.20.
The parameter cut of the universal jet definition directly controls
(Non) optimality of jet definitions 5.26 the errors induced in the physical information by the replacement of
events with the corresponding configurations of jets.
The above reasoning shows that a jet algorithm can be re- 5.30
garded as a tool for approximate evaluation of at least basic
shape observables. However, recall that general C-continuous
observables -- including the optimal observables 4.20 -- can
be approximated by algebraic combinations of basic shape ob-
servables (Sec. 3.35). This means that the error estimate 5.18
will be inherited by a class of general C-continuous observ-
ables which have appropriate regularity properties. (From the
derivation in Section 6 this should be a C-continuous analog of
continuous second order derivatives; cf. the discussion in
Sec. 4.3.) The optimal observables 4.20 cannot be reasonably


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 28 of 45



Inclusion of dynamical information 5.31 Derivation of the factorized estimate 6

As is clear from Eq. 5.25, one can include dynamics into In this section we are going to obtain a factorized estimate
consideration by simply making cut depend on P. of the form 5.18 which would satisfy the criterion of optimality
All the dynamics is expressed by the probability density of Sec. 5.26.
(P). Suppose it has enhancements for certain types of events Surprisingly, all one needs to obtain such an estimate is es-
(as indeed it does in QCD owing to collinear singularities). sentially an angular Taylor expansion through second order.
Then it would be sufficient to choose cut(P) to anticorrelate Recombination matrix z
with (P). For instance, a j 6.1

( )
P =  P Recalling 3.36, the quantity to be estimated becomes
cut cut ( ) , 5.32

where the factor (P) 1 (which should, strictly speaking, be f ( )
P - f ( )
Q = E f ( $p )
- E f ( $
q )
. 6.2
a a a j j j
a shape observable) contains all the dependence on events. To construct a bound for the r.h.s., one can only compare the
Then Eq. 5.25 becomes values of f at some $pa with its values at some $q . But which
j
 dP( )
P (P $
cut z ) . 5.33 pa to compare with which $
q may not be decided a priori.
j

Choosing to anticorrelate with would reduce the integral Introduce the recombination matrix z a j which is heuristi-
thus suppressing the overall error. cally interpreted as the fraction of a -th particle's energy that
From the point of view of minimization of induced errors, goes into the j-th jet (this interpretation will be justified below;
the following observation makes such a modification less at- cf. 6.16). Impose the following restrictions on z a j:u
tractive. Indeed, the computational savings due to larger cut
for the events which are produced less often may turn out to be z 0 for any a, j;
aj 6.3
simply not worth the trouble: one could simply take a smaller
def
event-independent z = 1- z

0 for an a
y . 6.4
cut from the very beginning. a j aj

However, a non-constant cut(P) would affect the shape of
the k-jet subregions in the space of events similarly to the dif- One can see from the derivation that removing the restric-
ference between how, say, cone and k T algorithms see jets. So tions on z a j does not expand the eventual range of options. All
one may wish to keep this option open for ultimate flexibility. one has to do is replace z , z z , z in 6.8 and 6.20, so
aj j aj j
Note that if one modifies the conventional scheme 4.38 as a that configurations not satisfying 6.3, 6.4 are automatically dis-
whole -- and the most important modifications seem to corre- favored compared with the corresponding boundary points.
spond to relaxation of the jet-number cut (some such options Non-zero values of the quantity z
are discussed in Section 9) -- then the details of how K -jet a correspond to some en-
sectors are defined in the space of events may become less im- ergy being left out of the formation of jets (the so-called soft
portant. energy ). We will see that this corresponds to exclusion of some
soft stray particles (the soft component of the event's energy
Determining a specific form for (Q) is left to experts in flow) from the formation of jets.
the dynamics of QCD. In practice high precision is not needed
here, and one could choose (P) to depend on Q found e.g. Allowing fractional values for z a j:
using the universal jet definition with = a) fully agrees with the physical picture of production of
cut cut or some
other value. Then (Q) could be chosen so that -1(Q) roughly colorless hadrons as a result of collective interaction of
the underlying hard colored partons;
imitates the structure of dominant terms in (P). Note that the
quantities such as the invariant masses of the jets, and trans- b) is extremely convenient algorithmically because the
verse momenta of particles in each jet -- along with new inter- space of all possible jet configurations for a given event
esting characteristics such as the fuzziness of each jet; cf.8.19 is then path-connected, so that any jet configuration can
-- are easily computed from the output of the optimal jet defi- be reached from any other via a continuous path, allow-
nition which we will derive. ing efficient shortest-path search algorithms [7].

Lastly, an effect essentially equivalent to a modification of We will say that the a -th particle belongs to the j -th jet if
cut according to 5.32 can be achieved via keeping cut P- z = =
aj 1 . If za 1 , the particle is said to belong to soft energy.
independent but replacing [P, Q] in 5.19 by another function With the recombination matrix, rewrite 6.2 as follows (the
such that first line is an identical transformation of 6.2, which explains
[ ,
P Q] [ ,
P Q] . 5.34 the restriction 6.4):

Then the control of information loss in the transition from z E f ( $p ) + z E f ( $p ) - E f
e j ( $
q )
a a a a a j aj a a j j j
events to jets would still be ensured but one could choose
[ ,
P Q] to meet some additional requirements. The difficulty z E f ( $p ) + e z E f ( $p ) -E f
( $
q )j
a a a a j a aj a a j j
here is to keep [ ,
P Q] simple and suitable for numerical im- z E f ( $p ) + z E f ( $p ) - E f
( $
q ) . 6.5
plementation. a a a a j a aj a a j j

A detailed investigation of these options is beyond the scope One sees why we split particles into fragments correspond-
of the present paper. ing to jets rather than vice versa: we target situations with


u Formulas in solid boxes are part of the final result; they represent all the
information needed for algorithmic implementations.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 29 of 45



fewer jets than particles, so it is desirable to arrange cancella- to the unit 3-vector corresponding to $
q . Then map the direc-
j
tions between as many terms as possible (the inner sum), and tions to : $ $
p p , so that the angular distances between di-
to minimize the number of positive terms in the outer sum. rections are preserved near $
q :
j
Estimating the effects of soft energy 6.6 $ $ $ $
- = Od - , $ $
p q p q p
i q
j j j , 6.11
The first sum on the r.h.s. of 6.5 can be estimated as fol-
lows: where the l.h.s. is a euclidean distance in . (An example of
such a mapping is given in Sec. 7.2.)
z E f ( $p ) C [ ,
P Q] , 6.7
a a a a f ,1 soft Then f ( $p) becomes a function on which we denote as
where C f
( $p ) f ( $p) . We will use the Taylor expansion in the form
f ,1 is the maximal value of | f | over all directions and
of the following inequality:
P,Q =
soft [ ] z E . 6.8
a a a f
( $p ) - f
( $
q ) - [ $ $
p - q ] f '
( $
q ) C
a j a j j aj f , 3 , 6.12

This quantity will play a central role in the optimal jet defini- where $ $
p q
- is a vector in and f '
tion. It is interpreted as the event's energy fraction left out of a j is the gradient of f .
the formation of jets (the soft energy, as we agreed to call it). It The constant Cf ,3 hides maximal values of some combinations
can be visualized as a background from which jets stick out. of f and its derivatives through second order. The maximum is

Understanding the form of taken over all directions $
q because we will deal with a sum
soft 6.9 j

over unspecified $
q .
Mathematically possible are many other ways to obtain a j
factorized estimate of the form 6.7. The variant with 6.8 is sin- The only properties which we require the factor aj to have
gled out by the following properties: are that it is a monotonic function of the angular distance
(i) Analytical simplicity which leads to fast algorithms. | $ $
p - q |
j , and it is such that
(ii) Linearity in energies of all particles which ensures infra- | $ - $ |2, $ $
p q p q . 6.13
red safety of the resulting jet definition. aj j j

(iii) The property that can be called maximal inclusiveness. It may otherwise be arbitrary. A modification of aj within

For instance, also possible is a bound in terms of these restrictions is compensated for by an appropriate change
max (z E ) but that would require comparison of particles' of C
a a a f , 3 . This observation effectively decouples the form of the

energies, which is physically meaningless if their directions are r.h.s. of 6.12 from the concrete choice of the mapping .
close. The result 6.12 can be used to estimate the second term on
A somewhat more meaningful option would be to perform a the r.h.s. of 6.5 (add and subtract terms as needed to apply
smearing of the soft energy over some angular radius thus 6.12). Take into account the fact that the values of f at differ-
transforming the soft energy flow into a continuous function, ent points are in general independent, so the corresponding ex-
and then using the maximal value of that function as an alter- pressions have to be bounded independently. Obtain the fol-
native to 6.8 (the constant C f , 1 would change accordingly). lowing upper bound for the second sum on the r.h.s. of 6.5:
This would be similar to the so-called `f'-cut [2], i.e. a lower -
cut on the energy of the jets retained in the final jet configura- C z E
E
d i
f ,4 j a aj a j
tion.v However, there are three reasons why such alternatives
seem to be undesirable: +C z E [ $ $
e p - q ] j + C z E .
d i
f , 5 6.14
j a aj a a j f , 3 j,a aj a aj

1) On the measurement side, they introduce non-optimality
into the bound implying a further loss of information in the Minimizing 6.14 6.15
transition from the event to jets. The task is to minimize 6.14 using the freedom to choose
2) Computationally, they introduce a complexity into our jet E
j , $
q , z
definition unwarranted by physical considerations.w j a j . The arbitrariness associated with , and aj
will require additional consideration to be eliminated
3) On the QCD side, they are less inclusive than the expres- (Section 7).
sion 6.8, i.e. they introduce into consideration subregions of
phase space. It is a well-known fact that exclusiveness of ob- The first term is suppressed if
servables anti-correlates with the predictive power of QCD. =
For instance, a totally inclusive treatment of soft energy was E z E for each j . 6.16
j a aj a
built already into the jet definition of [8].
Taylor expansion in angles 6.10 This fixes E j in terms of z a j and is immediately interpreted as
energy conservation in the formation of jets.
To Taylor-expand f ( $a
p ) near $
q is just a little tricky, and
j The second term is suppressed if
we proceed as follows. Consider the plane which is normal
E $ z E $
q p
= j
for each
j j , 6.17
a aj a a

v The discussion in the first posting of this paper interpreted conventional where we used 6.16. This determines $
q (via $
q ) in terms of
j j
procedures incorrectly. The present version owes much in this respect to
ref. [2]. z a j. Note the arbitrariness due to the arbitrary which will be
w In contrast, the conventional algorithms seem to favor the `f'-cuts be- fixed in Section 7. Anyhow:
cause, apparently, there is no simple recipe to identify the particles to be
relegated to soft energy prior to recombinations.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 30 of 45



From 6.23 one obtains Eq. 5.18 with = W(,
W )
soft .
Eqs. 6.16 and 6.17 fix the parameters of jets in terms of the re- (Note that although thus defined is not a linear function of
combination matrix za j which, therefore, is the fundamental un- all particles' energies, all the dependence on the event is via
known in this scheme.
6.18 two such functions, and soft. So infrared safety is not an is-
sue here.)
 The described trick actually differs from conventional The linear choice 6.25
schemes only by explicit presence of z a j which fully describes
the distribution of particles between jets in any jet finding However, there is one choice of W() which is singled out
scheme with energy-momentum conservation. by its nice properties, namely,
With 6.16 and 6.17, only the last sum survives in 6.14. So, W() = R-
2 1 + .
redefining the inessential f -dependent constants and recalling 2 6.26
Eq. 6.7, we arrive at the following estimate: The coefficients of the linear combination must be positive,
and the overall normalization is inessential. The specific form

{ f , }
P - { f , Q} C of the coefficient, R-2 , is chosen for convenience of interpre-
1 +
f , [ ,
P Q] Cf ,2 soft [P, ]
Q , 6.19
tation. Its introduction makes explicit the arbitrariness in the
where choice of measurement unit for angles (the role of R is dis-
cussed in Sec. 8.14).
[P Q
, ] = z E
. 6.20
j,a aj a aj With W given by 6.26, one obtains 5.18 with
C = max aR2 C f
,1, C and with replaced by
Note that is linear in all particles' energies as is f f f ,2
soft; cf. the
comments after 6.8. Recall also that = O(2 ) for small
aj aj -2
= +
R R soft . 6.27
a j , which is the angle between the a -th particle and the j -th
jet.
 This choice is singled out by the following properties:
Although the bound 6.19 falls short of the desired factorized
form 5.18, its derivation did not involve any arbitrariness that (i) analytical simplicity resulting in transparency of the cor-
would deserve any further discussion. responding jet definition and simplicity of implementation;
The following two points do deserve a detailed discussion: (ii) the inequality 6.23 becomes an identical equality for

(i) The arbitrariness in the choice of the angular factors 2
a j C = R C
f ,2 f ,1 . 6.28
in 6.20. This will be fixed in Section 7 from simple kinemati-
cal considerations, resulting in considerable algorithmic sim- This last fact means that for observables satisfying 6.28, the
plifications. transition from the basic estimate 6.19 to the factorized one,
Eq.5.18, via 6.26 does not entail any further loss of information
(ii) Transition to the factorized form 5.18. about the event -- for any event. Only the linear form 6.27 has
this property.
Obtaining a factorized estimatex 6.21
We will consider linear form 6.27 as a standard reference
General options 6.22 point for comparison of alternatives. This issue will be further
discussed in the context of the so-called  soft distribution in
Mathematically speaking, the basic bound 6.19 can be re- Sec. 8.19.
duced to the required factorized form 5.18 in a variety of ways.
Consider and soft as components of a two-dimensional vec- Existence of the optimal jet configuration 6.29
tor =
( , ) (, )
1 2 soft . Then for a wide class of non-
negative functions W() one can obtain inequality of the form We have obtained the factorized estimate 5.25 with given
by 6.27. This allows us to define optimal jet configurations ac-
C + C C W for all
f , f , f , ( ) ,
1 1 2 2 W . 6.23 cording to the prescription 5.20.

For instance, one can take Such a configuration Qopt always exists. Indeed, the quan-
tity R is a non-negative continuous function of z a j, and the
W p p 1 p
( ) = ( 1 + ) /
2 6.24 domain of z a j is compact for each fixed N(Q) (cf. 6.4, 6.3).
So the l.h.s. always has a global minimum in this domain.
with , , p > 0 . In any event it is reasonable to restrict W to Furthermore, the minimum value is a monotonically decreasing
satisfy the condition W(k) = kW () for all positive k (or function of N (Q) because each extra jet in Q adds new degrees
even to be a norm in the mathematical sense, i.e. also satisfy of freedom for minimization, driving down the minimal value
W( + +
1 ) W(
2 ) W(
1 )
2 ). which reaches zero for all N (Q) N(P). So Qopt exists for any
P and cut > 0.

x The first posting of this paper described a somewhat simplistic way to The global minimum need not be unique even modulo re-
take into account numberings of jets.
soft (then called miss) in which one would minimize
while keeping soft fixed to a constant. It was justified by a somewhat
vague reference to "the physical meaning of jet counting" -- and, although
not incorrect, was the only step of the derivation not clarified by a precise
argument. The systematic approach outlined below attains an ultimate
analytical simplicity for the criterion, exhibits a deep connection with the
conventional cone algorithms, and results in a much faster algorithmic im-
plementation thanks to elimination of the algorithmically cumbersome re-
striction soft = const.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 31 of 45



Fine-tuning the angular factors in [P,Q] 7 1 1 2 1
= - p q -cos [ $p - $q ] -
$ $ ~
aj 1 a j 1 E p q , 7.8
2 aj 2 a j a a j

The form of the angular factors a j in 6.20 is fixed within where
the arbitrariness of the scheme by simple additional considera-
tions: (a) conformance to relativistic kinematics; (b) momen- ~
q = ( , $
1 ) , ~
q 2
q =
j j j 0 . 7.9
tum conservation. The true elegance -- and final justification
-- of the resulting construction is in the considerable compu- This is a light-like Lorentz vector with unit energy uniquely as-
tational simplifications resulting from a representation of the sociated with the jet's spatial direction $
q j . Then one uses 7.1
jet finding criterion in terms of 4-vectors and Lorentz scalar
products (see after 7.10). to perform the summation over a and obtains:
First, with each pair Ea , $a
p one associates a massless 4- [ , ] = [ , ] ~
P Q P Q 2 q
q , 7.10
vector p , p2 = j j j j j
a a 0 (specific expressions depend on the repre-

sentation of $pa ; see below). Then define: where the r.h.s. contains only Lorentz scalar products.
Note that in this kinematics q q~ = E -
j j j |qj | but the covari-
q = z p
j a aj a . 7.1 ant form 7.10 is more general as we are going to see shortly.

This object occurs in a natural way in our construction, and we Cylindrical geometry (hadron collisions) 7.11
will call it the jet's physical 4-momentum .
According to the standard Snowmass conventions (cf. [2]),
Spherical geometry (e +e - hadrons in c.m.s.) 7.2 here one direction (the beam axis) is singled out, and one em-
phasizes invariance with respect to Lorentz boosts along the
Here one emphasizes spherical symmetry. The directions $p beam axis. Therefore one should use the representation 3.4 for
are interpreted as points of the unit sphere, i.e. unit 3-vectors: particles' 4-momenta. In particular, one has to interpret ener-
$
p2 = 1 . Then the 4-momentum p gies according to 3.7 in all the formulas related to jet defini-
a associated with the pair
E 0 = tion. Then a reasoning similar to the spherically symmetric
a , $ a
p has the energy component p E
a a and the 3- case leads one to the following results:
momentum component p = E p$ .
a a a The j -th jet's transverse direction $
qj is determined simi-
3-momentum conservation 7.3 larly to 7.6 from conservation of transverse momentum:

We must choose a mapping of the unit sphere to the plane
q q q
=
which is normal to $
q $ j j | j |, 7.12
j . A simple choice is the stereographic

projection from the point - $
q j : with qj taken from 7.1. (At this point we choose to differ
$ $ $ $ $
p = p + t p
d + q i = p + O(2 ),
a a aj a j a aj 7.4 from the Snowmass definition which postulates conservation of
energy-weighted azimuthal angle in jet formation. For narrow
where t = (1- c) (1 + c = $p $
aj ) with c a q j . Then Eq. 6.17 is jets the two definitions are equivalent. On the other hand, our
rewritten as definition leads to a simpler code; cf. the remark after 7.10.)
For the jet's pseudorapidityz one has the Snowmass defini-
$ $ $
q E = q + z E t p
d + q i = p + O(E 2 ) .
j j j 7.5
a aj a aj a j j a aj tion which is invariant with respect to boosts along the beam
axis:
where qj is the space-like component of 7.1.
E = z E . 7.13
The arbitrariness in the choice of the mapping manifests it- j j a aj a a

self through the terms O(E 2
a aj) in 7.5. The simplest choice is For a j there is the following simple choice (this structure
to drop those terms altogether but then one would have to im- is borrowed from [27] where it appeared in the context of con-
pose a correct normalization on $
q j : ventional jet algorithms):

1 = cosh( - ) - cos( -
$
q = p | p | aj a j a j ) . 7.14
j j j . 7.6 2

Then -- surprise! -- one recovers 7.10 with
The direct normalization here cannot take one outside the 2
O(E 2 ~
q = (cosh , sinh , $
q ) , ~
q = 0 , 7.15
a aj ) arbitrariness in 7.5 (because ensuring a correct nor- j j j j j

malization of $
q 2
j was part of the job of the O(Ea aj ) terms). where ~
q j is also a light-like Lorentz vector uniquely associ-
This can also be verified directly.
ated with the jet's spatial direction specified in this case by the
Fixing a j 7.7 rapidity
j and the transverse direction $
q j -- and also with
aj can be chosen in such a way as to eliminate one cum- unit energy -- but now it is the unit transverse energy!
bersome summation over all particles in the event (which has
to be performed with 6.20 after evaluation of $
q j ) and reduce

all the complexity in the computation of the criterion to
evaluation of the 4-vectors q j .y The choice is this:
speedup (by two orders of magnitude) of the minimum search algorithm;
for more see [7].
y The described choice allows a simple incremental update of q j after a z
modification of a particle's splitting between jets, which results in a major Note that one can compute the jet's physical rapidity directly from q j .


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 32 of 45




A simplified jet definition (the -criterion) 8.4
Summary. The optimal jet definition (OJD) 7.16

Finding an optim al configuration of jets Q (Eq. 4.31) for a First it is convenient to ignore soft in 6.19. This is valid for
given event P (Eq. 3.6) is equivalent to finding the recom bina- events without soft particles outside a few energetic jets and is
tion m atrix z a j (Sec. 6.1) that determ ines jets' param eters via equivalent to restricting the jet configurations Q used to mini-
7.1 and 7.6 (for spherical [c.m .s.] kinem atics) or via 7.12 and mize the error 6.2 by requiring that all particles are included
7.13 (for cylindrical [hadron collisions] kinem atics). into the formation of jets, with none relegated to the soft en-
The m atrix elem ents z a j are found according to the pre- ergy. Formally, this is described as follows:
scription 5.20 with [P
R ,Q] specified by 6.27 where
[ z 0 soft[ ,
P Q] 0 .
a 8.5
so f t P, Q ] and [P, Q ] are defined, respec t ively, by 6. 8 and

7.10. Such a restriction makes the error estimate less precise entail-
The light-like Lorentz vectors ~
q j are given by 7.9 ing a non-optimal loss of information in the transition from P
(spherical kinem atics) or 7.15 (cylindrical kinem atics). to Q, but it is otherwise admissible.
The corresponding simplified definition is as follows:
This is the simplest universal dynamics-agnostic jet defini- A sub-optimal configuration of jets Qsub for a given event P
tion. Dynamical considerations can be accommodated as de- minimizes [P, Q] and meets the following criterion with a
scribed in Sec. 5.31. minimal number of jets:

[ ,
P Q ]
sub c
y ut . 8.6
Understanding the mechanism of OJD 8
It will be convenient to refer to this as the -criterion.
To understand how the optimal jet definition (OJD; Note that this type of the criterion corresponds to R
Sec. 7.16) "finds" jets, it is sufficient to understand what jet in 6.27. (For very large R , any contributions to soft energy
configurations yield minima for the criterion depending on would be disfavored. See also the discussion in Sec. 8.14.)
R

the structure of the original event P. Minimizing 8.7
"Fuzziness" of the event 8.1 Let us verify that the -criterion satisfies the boundary con-
For each integer m 1 , compute the quantity dition 4.33.
The quantity [P, Q], Eq. 6.20, is sensi-
JR P
( ) = min P
[ ,Q ] 0
m R ' . 8.2 tive to presence of sprays of particles in the
N (Q' ) = m event P due to the angular factors . Con-
a j
For each fixed P and R , this sequence monotonically decreases sider the simplest event P with two particles
with increasing m . carrying equal energy. Then the criterion
will see either one or two jets depending on 8.8
As will become clearer from what follows, the observable whether or not
JR P
( ) is best described as the event's cumulative fuzziness
m 2 <
relative to m axes at the angular resolution R . It receives con- 1 y
4 ~ cut 8.9
tributions of two kinds as seen from 6.27: (remember that we are always dealing with fractions of the to-
 a contribution from each of the m jets, = 2 ~ ; this tal energy of the event). For configurations with energy distrib-
j q j q j
uted between particles in a less symmetric fashion, a wider jet
can be conveniently called the fuzziness of the j-th jet; will be allowed for the same y
 cut .
a contribution from soft stray particles which is simply the Next suppose one has two pairs of parti-
soft energy soft .
cles, with a narrow angular separation be-
One can describe the mechanism as follows: tween particles of each pair, and the angu-
lar separation between the pairs denoted as
OJD minimizes the cumulative fuzziness of the event by balanc- 8.10
. Assume 2 >> y . Then if one mini-
ing contributions from each of the jets and from the soft energy. cut
8.3 mizes [P, Q] on the configurations Q with two jets, there is a
global minimum corresponding to the configuration with each
The functions JR P
( ) are shape observables similar to pair combined into a jet, and the minimum is unique up to a
m
thrust (8.11). Observables similar to 8.2 were first introduced renumbering of the jets.
on the basis of conventional algorithms [28] but in our case In other words, the angular factors in the expression for
they are specified by explicit analytical expressions. Even sim- ensure a maximal suppression of a contribution from a spray of
pler analytical expressions (not involving optimization of any particles if the particles of the spray are made to constitute a
kind) were introduced in [3], [4] (the so-called jet-number dis- jet (i.e. the corresponding z = 1 with z = 0 for all j' j ),
aj aj'
criminators) but they avoid identification of individual jets al- so that the jet's axis is automatically inside the spray.
together. This conclusion extends to more than two jets: if N (Q) (the
In order to understand the mechanism of minimization, one number of jets in Q) matches the number of sprays of particles
notes that the analytical structure of OJD is very simple and in the event, then the global minimum of is reached on the
regular, so it is sufficient to consider a few simple examples. configuration with jets and sprays in one-to-one correspon-
dence so that each jet comprises exactly all the particles from
the corresponding spray. If sprays are not narrow enough then


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 33 of 45



the allocation of particles between jets is effected in a more ration with the less energetic particle relegated to the soft en-
dynamic fashion. ergy. The exact relation between the threshold angle and R de-
pends on how energy is distributed between particles (see be-
-criterion and thrust 8.11 low).

Recall the definition of the shape observable thrust, 4.27. In the case of (2) the -criterion must include the soft frag-
Suppose all particles of the event are localized within a suffi- ment into the jet. However, OJD would relegate the fragment
ciently narrow solid angle. Then the maximum is achieved for to the soft energy (the corresponding z = 1) to avoid en-
a
some axis inside the angle so that for all particles
< . Re- hancement by the angular factors (unless R is very large). As a
a 2
call 3.16 and obtain: result the jet will consist of the hard parton only.
A similar conclusion is reached for the case (3) where the
1 T min E 1 1 2
- = c -cos h min E . 8.12
a a a 2 a a a jet direction as found by OJD would include only the two hard
fragments.
Comparing this with 6.20 and 7.10, we see that finding the Furthermore, inclusion of an infinitesimally soft particle
thrust axis in this case is equivalent to finding the single jet di- ( , $p) into an event changes (apart from the overall
rection according to the -criterion. Then 1 - T is equivalent R
2 -2
to J P renormalization by 1 + ) by ~ j R if the particle is in-
2 ( ) with the two jet directions restricted to be exactly
opposite (each forming one half of the thrust axis). We con- cluded into the j -th jet (with j the angle between the jet and
clude: the particle), and by if the particle is relegated to the soft en-
ergy. So if the particle's angular distance from the nearest jet is
The -criterion generalizes 1 - T , where T is the thrust, to the <~ R then OJD includes it into that jet. Otherwise the particle
case of any number of thrust semi-axes which in the case of the - is relegated to the soft energy.
criterion become jet directions.
8.13 For non-infinitesimally soft particles the threshold angle is
< R . For instance, if an isolated hard parton is split into two
The same can be said about OJD because it is a modification of equal-energy fragments separated by 2 then OJD would in-
the -criterion. clude them both into one jet or relegate one to soft energy de-

pending on whether or not <~ R/ 2 . Note that either one of
From the -criterion to OJD.
Connection with cone algorithms 8.14 the two fragments can be relegated to soft energy, which sim-
ply means that the global minimum is not unique. However,
OJD differs from the -criterion by inclusion of the probability of occurrence of events for which the criterion
soft into
the function to be minimized. has a degenerate global minimum is theoretically zero. These
issues will be discussed in more detail in Sec. 9.1.
Let us discuss how OJD determines the optimal jet configu-
ration compared with the -criterion 8.6. Thanks to the ana- Given the generality of the described mechanism, we arrive
lytical simplicity and regularity of 6.27 (just two degrees of at the following conclusion:
freedom, and soft, both with a simple structure and a clear
meaning) it is sufficient to consider a few simple examples. OJD forms jets on the basis of local structure of energy flow
within the correlation angle R .
Fix a configuration P(0) that consists of only one hard parton Quantitatively, R is the maximal angular jet radius as probed by
with the 3-momentum represented by the left figure 8.15. infinitesimally soft particles.
Then both the simplified criterion 8.6 and the optimal one 6.27 8.16
would find one jet exactly equal to the parton (Q = P(0)), with
= soft = 0. Furthermore, the above examples allow us to relate the pa-
Now deform P(0) by: (1) splitting the parton into almost col- rameter R to the jet radius of the conventional cone algorithms
linear fragments; (2) radiating a soft fragment; (3) both. The R cone :
three configurations are as follows:
R R/ 2 , R = .
0 7 R =
cone cone .
1 8.17

8.15 The value 0.7 is preferred in the practice of cone algorithms on
(0 ) ( 1 ) ( 2 ) (3 ) empirical grounds (e.g. [29]).

To conclude:
What OJD and the -criterion would see depends on cut and
R and the magnitude of deformations (the acollinearity angle Sensitivity of OJD to the presence of soft stray particles is con-
and the energies of fragments). trolled by the two parameters R and cut :
In the case of (1), OJD will be yielding exactly the same
R controls which particles are expelled into the soft energy be-
configuration as the -criterion (at least for not too large acol- cause they are too far from jets' axes (the decision also depends on
linearity), i.e. all z = 0 . This is because it is more advanta-
a the particle's energy), and
geous to have the particles' energy contribute to where it cut effectively imposes an inclusive upper bound on the soft
will be suppressed by the acollinearity angle squared (cf. 6.13, energy.
7.8, 7.14), rather than relegate any fraction of it to soft (i.e. 8.18
have the corresponding z >
a 0 ) where no angular suppression
is present. It is also clear that R directly controls the threshold Remember that the primary role of cut is to control the loss
angle beyond which the configuration with both particles in- of information in the transition from events to the correspond-
cluded into the jet yields a larger value of R than the configu- ing jet configuration (Sec. 5.21).


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 34 of 45



The soft distribution 8.19
The distribution of events in the -soft plane aa provides
From the discussion in 8.14 it follows that it would be in- a direct model-independent way to quantify the two different effects
teresting to consider contributions to OJD of the two compo- in the mechanism of hadronization, namely, collinear fragmentation
nents, and and soft radiation. So the -soft distribution is a window on non-
soft, separately. For different events 8.15 this is
shown on the left figure below: perturbative QCD effects.
8.23

(1 ) ( 3 ) O( S )
O( 2 )  Note that an even more detailed information is provided by
S
the values of fuzziness j of individual jets. One can e.g. study
( 2 ) O( S ) the fluctuations of j within the same event, correlations, etc.
8.20
( 0 ) 
soft soft Also, the values of j together with soft can be used as ad-
ditional parameters on top of jets' 4-momenta. This is a natu-
Acollinear fragmentations shift the point along the -axis, and ral extension of the jet-related degrees of freedom in terms of
soft stray radiation, along the soft axis. which to parameterize the events, e.g. in the construction of
Then consider the figure 8.20 in the context of QCD. One event selection procedures of the conventional type or quasi-
sees that a fragmentation (1) or emission of a stray soft parton optimal observables (Sec. 2.25) for specific precision meas-
(2) are, from perturbative viewpoint, effects of relative order urement applications.
S whereas their combination (3) is an effect of relative order A word of caution: the values and soft may not be always
2 . Similarly, if one attributes these effects to non- stable with respect to data errors (unlike the minimum value of
S
perturbative "power-suppressed" corrections (i.e. suppression R ). This is similar to how positions of global minima may be
unstable under deformations of the function's shape. It results
by an extra power of the unnormalized total [transverse] energy in a smearing of the event distribution along the lines
of the event) then one arrives at a similar conclusion with S
replaced by E 1. R = const, and may impose limitations on the precision of
such tests of QCD. However, the precision requirements here
On the theoretical side, one can make the following obser- are not as high as in the Standard Model studies.
vations. For definiteness consider the process e+e- jets and
the distribution constructed for N (Q) = 2. Then the lowest or- On alternative forms of the criterion 8.24
der quark-antiquark events are concentrated at = soft = 0
(the distribution is () ( )
soft ). Emission of one gluon At this point it is convenient to discuss the ambiguity in-
( q q g events in the perturbative order volved in how and soft are combined to obtain a factorized
S ) creates two - estimate of the form 5.18. After that we will also discuss a
functional terms localized along the two axes: similar ambiguity with combining contributions from different
p ()
0 ( )
soft jets into one expression (Sec. 8.30).

+ p ( )  +  Combining and
soft soft
, ( ) p ( )
, ( )
S 1 1 soft , 8.21 soft 8.25

where p As was already pointed out (see before Eq. 6.23), the form
1, ( )
soft and p1, ( )
soft are continuous functions.
of the criterion which is linear in and
This is because the third parton is either included as a whole soft , Eq. 6.27, is,
mathematically, not unique. On the other hand, the qualitative
into one of the two jets or relegated to soft energy conclusions about how the criterion organizes particles into jets
(configurations with the third parton exactly at the boundary of and soft energy (as discussed above in connection with 8.15),
the two corresponding phase space regions occur with prob- remain valid for any based on any valid choice of W( ) in
ability zero). 6.26. In particular, the arguments around Eqs. 8.218.22 re-
Emission of further gluons gives rise (apart from modifica- main valid. This makes it worthwhile to examine whatever
tions of the coefficients of 8.21) to configurations which popu- further arguments one may find in favor of, or against the sim-
late the internal region > 0, soft > 0 , corresponding to a ple linear form 6.27.
continuous distribution: First of all note that the physically most important degree of
2 p freedom in W( ) is adequately represented by the free pa-
S 2 ( , )
soft . 8.22
rameter R . To discuss the remaining ambiguities it is conven-
Such a picture, with the -functions appropriately smeared ient to limit the discussion to the degree of freedom repre-
and deformed into the internal region > 0, sented by the parameter B 1 in the following alternative ex-
soft > 0 , is ex-
pected to be seen in the data (assuming correctness of pQCD). pression for R :
It may be possible to theoretically describe the smearing of - -2 1/
B B B
= d + i
R B R
, [ ] . 8.26
functions by taking into account power-suppressed corrections soft
as well as resummation of large collinear logarithms. Given This expression is infrared safe and leads to only marginally
that the -soft distributions can be constructed for any N(Q) slower code (the formulas for derivatives used in the algorithm
and for any process involving jet production, whereas the [7] become more complex though, but this affects only a small
mechanisms behind, say, power corrections seem to be rather part of the entire code which is executed not often provided the
universal, studying such distributions may prove to be a valu-
able test of our understanding of the dynamics of QCD.
To summarize: aa One fixes the number of jets, and for each event finds the corresponding

optimal jet configuration by minimizing R . and soft are obtained as a
by-product.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 35 of 45



covariant form of is used). In the limit B the function point (0) in Fig. 8.20): one is to add a non-negligible soft back-
becomes non-smooth: ground to narrow jets (the arrow directed to the right from the
origin in 8.20); the other is to make wider jets without much
-
= 2
d i
R R
, max , soft , 8.27 soft background (the arrow directed upwards). It is geometri-
cally clear that the situations where one of these mechanisms
which results in considerable algorithmic complications due to dominates correspond better to the notion that the number of
nonexistence of derivatives at some points in the space of re- jets in the resulting event stayed the same as in the parent
combination matrices. The same problem will be manifest for parton configuration, than a simultaneous effect of both
large B in the form of numerical instabilities. mechanisms (the diagonal direction). In the latter case one
So large values of B are excluded by the requirement of al- would prefer to count the same number of jets only if both
gorithmic simplicity. The same requirement favors the linear distortions are reduced. This seems to disfavor the shapes of
choice B = 1. However, most algorithmic efforts in a computer the region < cut which are protruding along the diagonal
implementation of the corresponding minimum search algo- (such as the rectangle in the right Fig. 8.20), and favor the
rithm [7] are spent on a proper handling of the recombination more "flat" boundaries like the one corresponding to the linear
matrix z a j and the computation of qj etc., and only a fraction choice 6.27.
of the total code deals with the formulas such as 7.10 and 6.27, In the final respect, the best argument for fine-tuning the
so that the values B > 1 are not, strictly speaking, excluded. form of may be based on dynamical considerations such as
The linear choice B = 1 is also singled out by a similar re- suppressing sensitivity to higher-order and hadronization ef-
quirement of analytical simplicity (needed to facilitate theo- fects. The pattern exhibited by the right figure 8.20 and the ad-
retical studies of e.g. power-suppressed effects using the  ditivity of small perturbative corrections (at least for small
S )
soft plot). seems to be rather universal and again favors the linear choice
Considering the alternative values for B , one might be B = 1 (which leads to Eq. 6.27).
tempted to add R -2 and soft in quadrature (B = 2). The corre- To conclude:
sponding region < cut would be a quarter of an ellipsoid (cf.
the dotted boundary in the right figure of 8.20). As a further There seems to be no obvious general argument to counter the
example, the rectangular region corresponds to < cut with appeal of simplicity of the linear form of the criterion, Eq.6.27, which
defined using B = (Eq. 8.27). A typical shape of the region also retains the most important degree of freedom represented by
R < cut for the linear choice 6.27 is shown with the dashed the parameter R.
straight line; larger R correspond to steeper slopes. The linear form is compatible with the additive nature of small
The position of each event on the plane is determined by perturbative corrections and seems to conform well to the intuitive
notion of which deformations of the parton event preserve the
minimization of and therefore depends on its specific form, "number of jets" best.
so that a straightforward comparison of shapes of the regions
8.29
< cut is in general not meaningful. However:


For sufficiently small deviations of the fragmented event from the Combining contributions from different jets 8.30
parent partonic event (the neighborhood of the origin of the -soft
plot, which corresponds to very small A similar ambiguity may be seen in the way contributions
S ) the resulting values of
and from different jets,
soft will not depend on the specific form of . j , are combined into a single expression
8.28 (the transformation of the sum over j in the transition from the
second to third line in 6.5). Most arguments of Secs. 6.226.25
This is because the minima of tend to correspond to con- remain valid here too. However, in the case of combining
figurations with z a j = 0 or 1 (Sec. 9.1), which fact ensures some and soft in a factorized estimate the problem was due to two
stability of resulting jet configurations with respect to small de- unknown coefficients C f, i (see 6.19) which vary independently
formations (unless the event is such that has a degenerate under arbitrary changes of the observable f . In the present case
global minimum -- a situation which occurs with probability there are no such unknown coefficients, and the inequality
zero). This phenomenon of "snapping" seems to persist for all j j becomes an exact equality for non-negative j.
for which the corresponding function W() (recall the rea- This seems to leave the linear form 7.10 as the only viable op-
soning in Sec. 6.22) is a convex function of the 2-dimensional tion here.
vector (i.e. a norm in the mathematical sense). For instance,
the already described (in Sec. 8.14) mechanism of balance be-
tween and soft which makes a particle as a whole to either
belong to a jet or be relegated to soft energy, remains operative
irrespective of whether one compares and soft or their posi-
tive powers as would be the case with the choice 6.24.

The proposition 8.28 means that in some neighborhood of
the origin of the soft plot, the distribution of events is inde-
pendent of the specific choice of as long as the correspond-
ing function W( ) is a norm, so that all one has to take into
consideration is the shape of the region < cut .
Then from a purely geometrical point of view (justified by
the tradition of using visual arguments in the construction of
e.g. cone jet algorithms), one can reason as follows. There are
two alternative ways to distort the parent parton event (the


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 36 of 45



Multiple jet configurations 9 (ii) AT T HE P OI N T S OF M I N I M A z a j ARE E QU AL T O
E I T H E R 0 O R 1. In other words, particles tend to belong to a
If jet algorithms are supposed to invert hadronization then jet or the soft energy as a whole rather than are split between
one should also take into account that there may be more than them. In this respect OJD is similar to the conventional algo-
one (perhaps a continuum of) partonic configurations that could rithms. (However, first principles do allow solutions with frac-
hadronize into a given hadronic state (Sec. 4.70). The problem tional z a j.)
is the more severe, the more pQCD corrections are taken into (iii) M
account. It is clear that at some level of precision, this effect I N I M A z a j AR E LOC ALI ZE D AT I S OLAT E D P OI N T S .
This directly follows from (ii).
must be taken into account in the theory of jet algorithms.
The multiplicity of parent partonic events for a given had- The connection of multiple local minima with the multi-
ronic event P is reflected in a multiplicity of allowed jet con- plicity of jet configurations as produced by conventional algo-
figurations. In the context of OJD this is manifest in the fact rithms is discussed in Section 10.
that any jet configuration Q that satisfies 5.19 is a valid candi- The occurrence of local minima in addition to the global one
date, and the induced error can still be controlled via 5.24 and poses the following problem. No minimum search algorithm
5.25. The global minimum is the best choice from the view- can absolutely guarantee that it has found the global minimum
point of minimizing the overall error but there are at least two -- especially for problems in O (100) dimensions (recall that
cases when a unique choice may be hard to make. One case is the dimensionality in our case is N particles  Njets). The best one
the potential occurrence of multiple global minima (Sec. 9.1). can hope to achieve is reduce the probability of missing the
Another is when equality is reached in 5.19 (Sec. 9.17). Both true global minimum e.g. by repeating searches from random
situations occur with theoretical probability zero but acquire initial configurations. Numerical experiments show that, given
significance in presence of detector errors. the efficiency of the minimum search algorithm described in
The phrase "a unique choice may be hard to make" means [7], an exhaustive search of all local minima with a high confi-
that there is a discontinuity in the mapping P Q . As any dis- dence level does not constitute a practical difficulty.
continuity, this is manifest as an instability that causes a en- The possible occurrence of several global minima poses the
hancement of errors for events near the discontinuity (Sec. 2.47 following problems.
and [4]). Algorithmically, the handling of the problem of mul- On the one hand, probability of production of events P for
tiple jet configurations is a special case of the general method which [P, Q] as a function of Q has a degenerate global
of regularization (Secs. 2.51, 2.52). minimum, is zero. Indeed, there is a finite probability to pro-
Two remarks: duce events with exactly N particles. In the subset of such
(i) The options discussed here go beyond the conventional events, the events for which [P, Q] as a function of Q has a
data processing scheme 4.38. degenerate global minimum is a set of measure zero because
minima are localized at isolated points. The probability density
(ii) These options emerge naturally in the theory of OJD but is a continuous function, whence follows the proposition.
they can be used in conjunction with conventional algorithms
although in somewhat cruder forms because then one would Small deformations of such Q
not have the fine control of the weights offered by OJD event (denote it as Pdisc) in general Q [P]
2
(Sec. 9.16). leave only one global minimum as [ ]
shown in Fig. 9.2 where the curves Q P
1

Multiple minima 9.1 describe trajectories of the min- P
ima, with solid parts correspond- disc P
Numerical experiments (sufficiently extensive to accept ing to global minima and dashed 9.2
these conclusionsbb) show the following: parts, to local minima.

(i) M U LT I P LI C I T Y OF LOC AL M I N I M A. Quite often (enough However, different deformations cause different global
so that the issue may not be ignored, depending on the prob- minima to survive. This means that with a non-zero probability
lem), there is more than one local minimum for the expression detector errors may distort some events close to Pdisc so that a
6.27 as a function of Q (or, more precisely, z a j) for fixed P local minimum will be seen as a global one.
and R . The simplest example is an event consisting of exactly Consider an observable defined via intermediacy of the
three particles with equal energies and arranged symmetrically. mapping P Q:
Then among all possible 2-jet configurations, there are three P
Q
(Q P
[ ]) = f (P) . 9.3
isolated global minima with the same value of [P, Q]. If one j.a.
deforms the event slightlycc then the three minima remain local This differs from the conventional scheme 4.38 in that now we
minima but in general only one will be the global minimum. do not assume any cut to be applied to the events. Any such cut
The number of local minima is not large (O (1) on the aver- is assumed to be incorporated into as a -factor (cf. 2.40).
age). It seems to correlate positively with the number of hard Such an observable will in general be discontinuous near
partons in the underlying partonic event. Pdisc because the values (Q k ) where Q k are different global
minima for Pdisc , are in general all different. Then slight defor-
mations of P would cause erratic jumps of f (P) between all
bb We used several hundred events generated by Jetset/Pythia [30] for typi- (Qk), causing a non-optimal sensitivity of the observable to
cal processes studied at CERN and FNAL. Note that the mechanism of detector errors. One can suppress these fluctuations using the
how R organizes particles into jets is essentially insensitive to the under- trick described below.
lying physics. We have also used some simple events constructed manually
to test the findings (iii) and (iv) in a more controlled manner.

cc A deformation may involve: a deformation of any particle's parameters;
splitting particles into slightly acollinear fragments; adding soft arbitrarily
directed particles to the event.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 37 of 45



The regularization trick 9.4 Recall that r should at least satisfy the restriction 2.59 where
meas should now be taken to be a typical error induced in 0
We are going to construct an observable f reg which would by detector errors in P. Eq. 9.11 can be satisfied together with
coincide with f away from Pdisc . But near Pdisc , it would in the mandatory restriction 2.59 provided
general perform a continuous interpolation between different
branches of f . <<
0 cut 9.12
Let Q k be the candidate local minima (they are actually in the sense that meas is small enough that the condition
functions of P; we will later discuss how Q k can be selected). 0 < cut is determined with a high reliability. The cases when
Suppose one can find weightsdd Wk normalized so that 9.11 cannot be satisfied will be considered separately
W
= 1 (Sec. 9.17).
k . 9.5
k
Let Q k, k = 1,... be all local minima of (Q) (with the cor-
Then one would define responding k = (Qk)) which satisfy the restriction
f P = W Q

reg ( ) ( ) . 9.6 < + r . 9.13
k k k k 0

If the weights Wk depend on P continuously then so does the Compute the weights Wk , k = 0, 1,..., from the conditions:
expression 9.6, and the discontinuity of the mapping P Q is W r-1cr + h
0 - . 9.14
effectively masked. k k

One should also ensure that the expression 9.6 coincides This together with the normalization 9.5 determines Wk .
with 9.3 for P outside some neighborhood O of the point Pdisc . The described trick eliminates C-discontinuities due to de-
For that, it is sufficient that the weights vary in such a way that generate global minima at least for events which satisfy 9.12.
only one of them remains non-zero outside O -- the one which This is because the values k vary C-continuously with the
corresponds to the true global minimum Q k . Then outside O event P in general. However, this is not always the case, as
only one term in the sum 9.6 survives with a unit weight, and discussed below.
f reg (P) coincides with f (P) defined by 9.3.
Cheshire local minima 9.15
The weights Wk can be heuristically interpreted as prob-
abilities that the event P resulted from hadronization of the Indeed, any event can be C-
partonic configuration Q k . See, however, the warning preced- continuously deformed into any
ing 2.58. other event, and the number of local
In terms of collections of events, the described mechanism minima of in general differs for different events. This means
amounts to a replacement of the initial collection of events P that some local minima disappear under small deformations.ee
with a collection of weighted (pseudo) events Q: This may somewhat spoil the regularization effect of the pre-
scription if one of the local minima happens to be such a
P Q , W
{ } m r ,
i k k 9.7 Cheshire minimum and vanishes while the corresponding
i k k is
non-zero. This is more likely for larger values of the regulari-
where the r.h.s. comprises all jet configurations for all P i . zation parameter r because the regularization procedure would
Then then see more local minima.

1 1 1 It is possible to detect the Cheshire minima in the context of
f (P ) f (P ) = W
9.8
reg (Q ) ,
N i i N i i N k k k the minimum search algorithm described in [7] by looking at
the values of gradient of . If these are smaller than some
where threshold then a corresponding factor should be introduced into

N = W
9.9 9.14 to effect a suppression. Then the weight Wk would vanish
k k
in a continuous fashion as the corresponding k approaches
in virtue of 9.5. the point where the local minimum disappears.

In any event, it seems that the effect of Cheshire minima
A prescription for Wk based on linear splines 9.10
could be dangerous only if detector errors are large enough that
Let us now present a simple prescription for constructing there is a sufficiently sizeable fraction of events whose local
such weights. It should be remembered that there is no a priori minima could be regarded as candidate global minima (and
recipe here (apart from the general desire to obtain a quasi- only a fraction of that fraction would exhibit the effect). In such
optimal observable; see Sec. 2.25). Also, it is not always neces- a case it cannot be excluded that one might have to abandon
sary to eliminate all discontinuities: one may decide to patch the scheme 9.3 altogether in favor of the more complicated C-
some discontinuities and leave alone the rest (e.g. the kinds of continuous observables, e.g. those constructed along the lines
discontinuities that occur seldom), depending on the problem. of [4].
So the prescriptions described in what follows should be con- Comments for conventional jet algorithms 9.16
sidered as merely examples.
Fix an event P and let Q0 be the point of global minimum It is possible to use the described scheme with conventional
of the function (Q) = [P, Q] with algorithms. In the context of, say, cone algorithms, Q
0 = [P, Q0]. k may be
Choose the regularization parameter r > 0 so that candidate jet configurations obtained e.g. for different initial
configurations of cones (or other variations of the algorithm).
+ r <
0 cut . 9.11 In this cases one would have to take all weights Wk equal.


dd We adopt the convention that i labels events P and k labels jet configu- ee Remember that there always is at least one global minimum for any
rations Q . So Wi and Wk denote different arrays of numbers. event.


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 38 of 45



Then as P varies, the weights will no longer vary continuously Regularization by variations of R 9.20
but since in general only one weight jumps at a time, the dis-
continuities would be mitigated. An interesting variation is to evaluate jet configurations for
A better option is to evaluate W each event for a sequence R
k for each Q k using n of values of the jet radius pa-
rameter R -- e.g. a few values around the standard value R = 1
k = R [P,Qk] with R borrowed from OJD (Eq. 6.27) even if
Q (recall 8.17). For instance, R
k are found using a conventional jet algorithm. This is possi- 1 = 1 - , R 2 = 1, R 3 = 1 + with
ble because all one needs is the corresponding recombination some .
matrices. These are easily restored from the output of any con- This is motivated by the formal meaning of the parameter R
ventional jet algorithm. (see Eq. 6.28 and the discussion around it) which may motivate
one to perform an averaging over R .
Regularizing the cut [P,Q] < cut 9.17
This option may be useful because events with clearly defined
In the situation we have just considered all candidate jet jets would tend to yield similar jet configurations for different values
configurations have the same number of jets. Next we suppose of R whereas more fuzzy events would yield different jet configura-
that this is no longer the case. tions for different n .
So if one performs, say, histogramming of events in order to de-
In the notations of 9.10, assume that r cannot be chosen tect a peak, then the events which yielded several similar jet configu-
small enough to satisfy 9.11 for whatever reason (e.g. because rations would contribute in a more "focused" fashion.
the condition 9.12 is not satisfied). It is assumed that On the other hand, the events which otherwise may have been
< + 1 entirely eliminated by selection procedures now have a chance to
r
0 cut . 9.18
2 contribute their share of signal albeit with a weight < 1.
K is the minimal number of jets for which this condition is 9.21
achieved.
Define to be a function of Let n be the value of , with
R [ ,
P Qopt ] Qopt found ac-
0 that interpolates between n
the values 1 and 0 at the ends of regularization interval, e.g.: cording to OJD with R = R = ( -
n . Let A )
n cut where
n
R 1 if r A (x) is any monotonically increasing function, e.g. A (x) = x.
0 < cut - 1
| ,
| 2 (A function such as A (x) = x2 would emphasize jet configura-
= S 0 if > + 1 r , 9.19 tions which are farther from the cut.) Renormalize
| 0 cut n so that
2
| their sum is equal to 1. The values thus found are larger for n
-1
r cr + h
cut -
T 0 otherwise . for which the optimal jet configuration effects a better ap-
proximation of the original event.
Also define = 1 - . Then for each n, find jet configurations together with the
Heuristically, is interpreted as the probability that the corresponding weights normalized in such a way that the sum
event P has K jets, then is the probability that the event has of weights for each event is equal to n . The jet configurations
at least K + 1 jets. for each n can be found in arbitrarily sophisticated fashion. In
For simplicity we assume that min the simplest case one takes one jet configuration found ac-
Q [P,Q] on configura-
tions with K + 1 jets does not exceed - 1 r cording to OJD without regularizations, and sets its weight to
cut . Otherwise the
2 be n . Alternatively, several jet configurations may be found
construction is to be iterated in the same spirit. (This is not using the regularization tricks described in Secs. 9.1 and 9.17.
likely to be needed often because min Q [P,Q] as a function In any case one ends up with a collection of jet configura-
of K decreases rather fast.) tions and weights whose total sum is equal to 1, and the regu-
Now, in the K -jet sector, define Qk and the corresponding larized observables are found using 9.6.
weights Wk as in Sec. 9.10 except that the sum of Wk is nor- This option could be used with conventional algorithms if
malized to rather than 1. Perform a similar procedure in the one takes the resulting jet configurations with equal weights or
(K + 1)-jet sector with the only modification that the sum of evaluates the weights as described in Sec. 9.16.
weights is normalized to . Consider the collection of all jet
configurations thus found together with their weights. The total Discussion 9.22
sum of weights is equal to 1 by construction. The regularized f
is obtained according to 9.6 where now the summation runs (i) The described three regularization tricks regulate any ob-
over jet configurations with different numbers of jets. servable, irrespective of its specific shape and meaning. For
If in 9.3 incorporates a jet-number cut then one may instance, (Q) could be the (integer) number of dijets from Q
choose to drop from the r.h.s. of 9.7 the jet configurations whose mass belongs to some interval (bin) on the real axis.
which do not satisfy the cut. The weights W The corresponding regularized observable f
k are evaluated reg takes non-
prior to application of the cut, and the weights of the jet con- integer values but its continuity is exactly what is needed to
figurations retained in 9.7 are not affected thereby. The rela- suppress fluctuations induced by detector errors.
tion 9.9 is no longer valid, and the value of the normalizing (ii) One may consider replacing the prescription 9.6 by the
factor, N , has to be remembered separately. (This is similar to following one. Let the recombination matrices z(i)
aj corre-
how luminosity may have to be measured via special independ-
ent procedures rather than counting events for which a collision sponding to the local minima Q i. Then define
with a high transverse momentum occurred.) z(reg) = W z(i)

aj . 9.23
i i aj
=0,1K

If Eq. 9.5 holds, then this is a correct recombination matrix cor-
responding to some jet configuration Qreg . One may be
tempted to accept it as the resulting jet configuration. Since the


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 39 of 45



corresponding recombination matrix 9.23 has fractional matrix
elements, Qreg can vary continuously in response to deforma- Comparison with cone algorithms 10.1
tions of the original event. Then one would define the regular-
ized observables reg(Q) to be simply (Qreg). Cone algorithms were introduced in [8] and define jets in a
Unfortunately, such an interpretation is only valid if the purely geometric fashion using cones of a fixed shape and an-
condition 9.5 holds, and the important option of regularizing gular radius R , so that the finding of jets reduces to finding the
the cut involved in the definition of jets, Eq. 5.19, must still number and positions of the corresponding cones.
follow the scheme of Eq. 9.8. Cone positions are found via some kind of iterative proce-
Furthermore, the regularization effect for discontinuous dure. We note that such an iterative search procedure can al-
(Q) (cf. the example 4.42) is weaker here compared with the ways be interpreted as a search of a minimum of some implic-
prescription 9.8. This is because the values (Q itly defined function on jet configurations; the function is para-
reg ) still jump
in response to variations in P, although less erratically thanks metrized by the event. It is clear that in general such a function
to the more stable Q may have many local minima, similarly to what was observed
reg as a function of P.
for OJD in Section 9.
(iii) The available experience seems to indicate that the val-
ues of at different local minima (if there are any) for the The choice of initial configuration to start iterations from is
same event may exhibit a significant spread. This means that not fixed by scientific considerations. Depending on how one
local minima with values close to the value at the global mini- makes this choice, one ends up with different jet configurations
mum occur rarely. in the end. It is not difficult to realize that:

(iv) The regularization tricks that yield a mixture of jet con- The problem of choosing the initial configuration -- which has as
figurations with different number of jets may help to extract a consequence non-uniqueness of the resulting jet configuration --
signal from events that would otherwise be dropped owing to represents a vicious circle in the definition of cone algorithms. To
the jet-number cut. For instance, suppose one looks at some break it one needs an extraneous principle, which for conventional
process with 4 jets in the final state. Then events that would algorithms is usually replaced by a convention.
normally be counted as 3-jet events may, with regularization 10.2
tricks, yield meaningful 4-jet configurations (with fractional
weights < 1). And vice versa: events that would normally be In the case of OJD one simply opts for the global minimum
counted as having 4 jets but with some pairs of jets close, of a well-defined shape observable, which corresponds to
would "spill" some of their content into the 3-jet sector. The minimization of the information loss incurred in the transition
net effect here is equivalent to a relaxation of the rigid con- from events to jets. It should be emphasized that the candidate
ventional jet-number cuts. jet configurations of the cone algorithms correspond to the lo-
 cal minima of OJD -- not the degenerate global minima which
All in all, the described regularization schemes are equiva- occur much less often and to handle which our theory provides
lent to a more sophisticated representation of the event's simple options (Section 9).
physical information -- a representation in terms of several
weighted jet configurations whose number may fluctuate de- The termination condition for the cone algorithm is usually
pending on the event's fuzziness, etc. Such a representation ad hoc too. For instance, the algorithm may seek to make the
preserves more information about the original event than any cone axes coincide with the corresponding jets' 3-momenta
one jet configuration. This is especially true for the events with [29].
jets that are hard to resolve. The conventional jet-finding The original proposition of [8] was to minimize the energy
schemes correspond to enforcing a choice of one jet configura- left outside all jet cones, which is similar to the mechanism of
tion even in situations when the choice is not clear-cut, and the OJD (8.18; cf. Eq. 10.7). However, the algorithm of [8] is algo-
jet configuration chosen may be a wrong one. On the other rithmically inconvenient, so the currently used variations [2]
hand, with a regularized choice, the "correct" jet configuration abandon the theoretically preferred inclusive treatment of soft
will have chances to survive the jet-number cut, perhaps, with energy (Sec. 6.9) in favor of lower cuts on energies of candidate
a fractional weight. jets (the so-called `f'-cuts).
Note also how a scientific consideration is sacrificed here in
favor of convenience of implementation of an ad hoc scheme,
Comparison with conventional algorithms 10 whereas with OJD, the theoretically preferred treatment of soft
energy (Sec. 6.9) also leads to a simpler, faster and more robust
The proposition 8.16 establishes that OJD is essentially a computer code [7].
cone algorithm with an inclusive treatment of soft energy A murky problem specific to cone algorithms is how to treat
(Sec. 6.9) rewritten in terms of thrust-like shape observables cone overlaps. It remains essentially unsolved because of a lack
(Sec. 8.11). In this section we compare OJD with the conven- of a guiding principle beyond the basic boundary condition
tional jet algorithms in a more systematic manner using the 4.33. For this reason one usually recurs here to ad hoc conven-
criterion of Sec. 5.26 for guidance. tions.gg
There are two widely used classes of jet algorithms devel- The mechanism represented by the parameter R in OJD in-
oped by trial and error: cone and recombination algorithms.ff dicates its similarity to the conventional cone algorithms. The
We will consider them in turn (Sec. 10.1 and Sec. 10.9, respec- similarity is further exhibited by the algorithmic implementa-
tively). tion of OJD described in [7].

We conclude:



ff See also sections 5.2.15.2.2 of [1] where they are called cluster and
combination algorithms, respectively. gg Perhaps, after intense discussions in a working group ;)


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 40 of 45



Comparison with recombination algorithms 10.9
OJD is a cone algorithm in disguise with jet shapes determined
(and jet overlaps handled) dynamically by means of a shape observ- Recombination algorithms emerged in the context of Monte
able taking into account the distribution of energy in jets.
10.3 Carlo hadronization models (the Luclus algorithm [30]) with
inversion of hadronization as a primary motivation, apparently.
(See also 8.16 and 8.17.) The recombination scheme was popularized by the JADE algo-
One can obtain a less optimal (in the sense of Sec. 5.26) jet rithm [31], and subsequently improved by the k T/Durham [32]
definition via a cruder estimate for , but such as would be and Geneva [33] variations.
closer to the cone algorithms. It is easy to obtain the following General discussion 10.10
simple upper bound for the fuzziness of the j -th jet:
A recombination algorithm iteratively replaces a pair of
[P Q E 2 =
j , ] j R j , R j max aj , 10.4 particles by one (pseudo) particle using some criterion to de-
aj
cide whether a given pair is to be recombined or left as is.
where the maximum is evaluated over all particles contributing
to the jet, so that R There are three problems here -- all similar to what one
j is interpreted as the jet's radius. The re-
sulting less optimal variant of the criterion would be encounters with cone algorithms.
~ One problem is the treatment of soft energy, and everything
= R R
E c h2 + .
R said about it in the context of cone algorithms is applicable
j j j soft 10.5
here (the theoretically preferred inclusive treatment is aban-
The mechanism of minimization is more transparent here than doned owing to a conflict with an ad hoc algorithmic scheme).
in the non-simplified case: take a particle from one jet and Another problem is the lack of any firm principle to deter-
move it to another or to soft energy. Then the criterion 10.5 mine the order of recombinations. Intuition suggests that clos-
would decrease or increase depending on the induced changes est neighbors should be recombined first but with O (100) par-
in the two jets' radii R j . ticles in the event, there is still much choice. Similarly, one
An even cruder version is obtained via the following upper may start recombinations with the most energetic particles.
bound for Eq. 10.5: This prescription is actually born out by the analogy with OJD:
~ as is seen from the expression 6.20, starting to collect particles
F max R R
E c h2 . 10.6
R H I
j j j j
K + soft into jets from the most energetic ones allows one to focus from
the very beginning on jet configurations in which largest con-
So one could define a jet finding scheme similarly to OJD but tributions -- those from the most energetic particles -- are
based on the following shape observable which is the same as suppressed.
the r.h.s. of 10.6: However, selecting the most energetic particles is a non-IR

~
~ safe procedure, so some preclustering is needed, which cannot
S-W = c - h c h2 +
tot max R R
soft .
R j j soft 10.7 be too coarse. This introduces an undesirable non-inclusivity
which may result in an enhancement of power corrections.
In this variant jets' radii ignore the details of the energy distri- These ambiguities take the place of the problem of choosing
bution between particles -- as in the conventional cone algo- initial conditions for the cone algorithms, so that the jet defini-
rithms. tions based on recombination algorithms also contains a vi-
Verbally: the criterion 10.7 would attempt to include as cious circle similar to the one pointed out for cone algorithms
much energy as possible into as few jets as possible with the (10.2).
jets' radii as narrow as possible but not exceeding R (as with The third problem concerns the choice of the recombination
OJD, a particle farther than R from any jet's axis is relegated criterion used to decide when two particles are to be recom-
to soft energy). If the event consists of non-overlapping sprays bined into one (this has a parallel in the problem of handling
of particles with angular radii (measured from the spray's 3- overlapping cones in the case of cone algorithms). There seems
momentum) not exceeding R , the criterion 10.7 will find jets to be a consensus emerging about a preferential status of the k T
in one-to-one correspondence with the sprays. criterion [32] as enabling better theoretical calculations.

In the absence of jet overlaps, the mechanism of the criterion 2 1 recombinations 10.11
10.7 is essentially equivalent to the original cone algorithm of Ster-
man and Weinberg [8], and similarly to the latter, it handles the soft Let us now take a closer look at recombination criteria.
energy in a theoretically correct fully inclusive fashion. To begin with, we note that a series of recombinations
Unlike the algorithm of [8], the criterion 10.7 does not require ad- 21 2 1
21 2 1

ditional prescriptions to handle jets' overlaps. P
P'
P"
K
Q , 10.12
10.8
is naturally interpreted in the framework of the definition 5.9
The criterion 10.7 may be implemented similarly to the as a series of approximations
simplex method [36]. Unfortunately, the analytical structure of f ( )
P f ( '
P ) f (P") K f (Q ) , 10.13
10.5 and 10.7 does not seem to allow the tricks which contrib-
uted to the efficiency of the implementation of OJD described so that each recombination can be analyzed within the frame-
in [7]. work of the developed theory.

A general conclusion is that the conventional cone algo- It is perfectly obvious that even if one performs each 2 1 re-
rithms are non-optimal (in the sense of Sec. 5.26) whenever jet combination in an optimal way, the scheme 10.1210.13 in general
energies exhibit a significant variation. causes an accumulation of additional errors (instabilities) compared
with a global optimization such as done in OJD.
10.14


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 41 of 45



To appreciate this, recall how economically cancellations were The last remark concerning the k T algorithm is as follows. It
arranged in our derivation of the corresponding error estimates; is an attempt to make use of the theoretical pQCD results to
cf. e.g. Eq. 6.5. improve upon the recombination scheme and is obviously mo-
Let us now obtain the 2 1 recombination criterion which tivated by the fact that the kinematics of 2 1 recombinations
corresponds to OJD. Consider how it treats a narrow spray makes irresistible an inclusion into the picture of theoretical
consisting of two particles a and b . In this case one sees from results such as the Sudakov formfactor. However, this per se
6.20 that if one combines them into one jet j then can hardly be regarded as a justification for the recombination
scheme as such and does not correct its fundamental deficiency
opt E E E + -
c E h 1 2

j a b a b ab . 10.15 -- the ambiguity of the order of recombinations (see Sec. 10.9).

This is exactly the geometric mean of the JADE criterion [31], The point here is not that QCD should be ignored in the con-
JADE struction of jet algorithms but that the recombination scheme may
E E 2
j a b ab , 10.16 not be the best receptacle to pour dynamical QCD wisdom into.
10.21
and the Geneva criterion [33],

Geneva E E E + -
c E h 2 2

j a b a b ab . 10.17 Non-uniqueness of jet configurations
(Remember that in our case all energies are fractions of the and the meaning of cut 10.22
total energy of the event.)
Furthermore, when recombining pairs of soft particles, the The above analysis indicates that the conventional algo-
JADE criterion underestimates 10.15 and thus would tend to rithms behave as imperfect heuristics for the minimization
combine them into spurious jets -- exactly the problem which problem in OJD. This observation reveals an interesting point,
gave rise to the Geneva [33] and k namely, existence of a source of errors entirely specific to con-
T / Durham [32] variations.
The Geneva and k ventional algorithms and uncorrelated for different algorithms.
T criteria (as well as the earlier Luclus crite-
rion [30]) overestimate 10.15 in such cases. From the view- Indeed, consider possible existence of several local minima
point of the developed theory, this is indicative of their non- of R (when the event does not appear to have well-defined
optimality but is otherwise safe (overestimating induced errors jets at a given resolution cut). The optimal algorithm simply
is not dangerous). repeats the search from different initial configurations (e.g.
To further compare Eq. 10.15 with the k T criterion, rewrite randomly generated), and if it finds more than one local mini-
the latter by normalizing to the total energy and taking square mum then the global minimum is selected simply by compari-
root to achieve first order homogeneity in energies: son of the corresponding values of R .
k It is not difficult to realize that situations with several local
T cE + E h mi b
n x, 1 - xg ,
j a b ab 10.18 minima seen by OJD have an immediate analog in the situa-
tions where the conventional algorithms find different configu-
where x = E E + -
c E h 1. Eq. 10.15 in similar notations be-
a a b rations depending on minor algorithmic variations such as the
comes choice of initial configuration or the order of recombinations.
opt c The conventional algorithms, however, provide no criterion
E + E h x(1- x) 2
.
j a b ab 10.19 to select the best configuration from several such candidates:
The difference in the x -dependence is Any ad hoc prescription amounts to a more or less random
minbx, 1 - xg
inessential (cf. Fig. 10.20; one can choice -- and from the viewpoint of OJD, a random choice of
bound one function by the other times the local minimum results in a jet configuration which may in-
x(1 - x)
a coefficient) unlike the angular de- herit less information from the initial event than is actually
pendence which is qualitatively dif- possible. In other words, the use of conventional algorithms
0 1 x
ferent. A tentative conclusion from implies a systematically larger loss of information compared
10.20
10.18 and 10.19 would be that the k with OJD.
T
criterion would tend to yield more jets at smaller angular sepa- The instability of the found configuration of jets which re-
rations than the variant 10.19. It is thus less optimal than 10.19 sults from random choice of a local minimum is due to the sto-
in the sense that in general it requires more jets to ensure that chastic nature of hadronization and is manifest on a per event
the same amount of information from the event is preserved in basis.
the resulting jet configuration. OJD would similarly fail in situations with several global
Of course, the latter property need not necessarily be a minima but such situations occur with theoretical probability
drawback because it ensures that the shape of the K -jet sectors zero, and if they do become important due to detector errors,
in the space of events (4.35) is qualitatively different here there are specific prescriptions to regularize the corresponding
compared with OJD, and this may be useful in practice (see the instabilities (Section 9).
discussion in Sec. 5.1). To summarize:
 On the other hand, the advantage of the kT criterion over (i) Ambiguities of conventional algorithms are an additional
other conventional schemes (better theoretical predictions) source of errors in physical results -- additional compared
seems to be overshadowed by an ultimate amenability to theo- with the theoretically optimal behavior of OJD.
retical analyses of the shape observables in terms of which (ii) Then OJD is preferable over conventional schemes in
OJD is formulated. proportion to how the number of events with more than one lo-
Recall also the remarks in Sec. 5.31 concerning how dy- cal minimum of R exceeds those with several global minima
namical information could be incorporated into OJD. (taking into account various experimental and theoretical un-
certainties). Events with several local minima seem to prolifer-


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 42 of 45



ate in proportion to importance of higher-order and power- lowed us to obtain a shape-observables-based analog of the
suppressed corrections. original cone algorithm of [8] (Eq. 10.7).

(iii) It should be possible to quantify these effects by deter- Neither is new the idea of jet finding via a global optimiza-
mining the fraction of events with several local minima tion -- such a version of recombination algorithms was earlier
(checking along the way how occurrence of several local min- explored in [35].
ima is reflected in ambiguities of results of conventional algo- Curiously, OJD yields for each jet what we called the physi-
rithms), and the fraction of events with several global minima cal 4-momentum q 2 >
j with q j 0 (Eq. 7.1), and simultaneously
(modulo various sources of uncertainties taken into account via a light-like 4-vector ~
q (Eqs. 7.9 and 7.15), both closely re-
[simple] models). j

(iv) We are also compelled to conclude that working with not lated and playing an important role in the definition; cf. 7.10.
Note in this respect that the variants of cone algorithm used by
too small values of the parameters such as cut and y cut im- D0 and CDF yield, respectively, massive and massless jets [2],
poses a fundamental limit on the potentially attainable preci- which can be associated with q ~
q .
sion of interpreted physical information such as parameters of j and E j j
the Standard Model, obtained via intermediacy of jet algo- The options for inclusion of dynamical QCD information are
rithms, although the potential numerical magnitude of the ef- also available although in a different form than with the k T al-
fect (or rather defect) remains unclear. This has a simple ex- gorithm -- via dependence of cut on events (Sec. 5.31) and via
planation: theoretical analyses of the shape observables R (Sec. 8.1).
Even the recombination scheme -- although it did not find a
The parameter cut and the similar parameters of the conven- visible place in OJD -- can still be regarded as a heuristic for
tional algorithms actually describe the errors induced in the physi- minimum search (Sec. 10.11).
cal information by the approximate description of events in The only important element of the conventional schemes not
terms of jets. incorporated in OJD is the lower (`f'-) cuts on jets' energies.
10.23 Stray soft particles are now handled via an inclusive energy cut
(8.18).
Cf. the estimate 5.18 from which OJD 5.20 is derived.
 The conclusion 10.23 has to be taken into account when What OJD derives from 11.2
comparing results of different jet algorithms. This may help to A widely held opinion (cf. [2]) is that the definition of jets
explain the finding that the prospective dominant error in the is subjective in nature. The developed theory shows that it is
planned top mass measurements at the LHC is due to the am- not quite so.
biguities of jet definition [34]. It may be possible to reduce
such an error using the methods described in Section 9. The important ingredient which has been missing from the
conventional discussions of jet definition (it would be mis-
 The conclusion 10.23 is also to be kept in view when com- leading to use the word theory here) is the information analysis
paring results obtained using the same jet algorithm but differ- of the problem of jet definition. Our analysis is based on an
ent event samples (e.g. CDF and D0). earlier groundwork [4] which emphasized a purely kinematical
viewpoint on jet algorithms as approximation tricks rooted in
Conclusions 11 -- but not identical with -- the dynamics of QCD.
The most important clarification of the theory of [4] ob-
The discovered optimal jet definition (OJD; it is summa- tained in the present paper is the notion of optimal observables
rized in Sec. 7.16) is essentially a cone algorithm (Sec. 8.14) for measurements of fundamental parameters (Secs. 2.7 and
entirely reformulated in terms of shape observables (the fuzzi- 4.19). The notion (together with the resulting practical pre-
ness; Sec. 8.1) which generalize the well-known thrust to the scriptions, Sec. 2.25) provides a guidance for a systematic im-
case of any number of thrust semi-axes (Sec. 8.11). The cone provement upon the conventional scheme of measurements
shapes and positions are determined dynamically via minimi- based on the notion of jets (Sec. 4.28; cf. the new options de-
zation of the fuzziness. The soft energy is treated inclusively scribed in Section 9).
via a cumulative cut on the soft energy (Sec. 6.9), which is The notion of optimal observables allows one to interpret
similar to the original prescription of Sterman and Weinberg the event's information content (which is the basis of OJD,
[8] but differs from the currently preferred `f'-cuts [2]. Section 5) in the light of the fundamental Rao-Cramer ine-
quality of mathematical statistics (Sections 2, 4.19 and 5.26).
The criterion is controlled by two parameters: R and cut .
The parameter R sets an upper limit on the maximal angular The general considerations which went into the derivation
of OJD are as follows:
radius of jets (Sec. 8.14). The parameter cut effectively sets an
upper bound on the soft energy allowed to be left out of jet 1. A systematic reliance on first principles of physical
formation, but its primary role is to control the loss of infor- measurement, quantum field theory and QCD.
mation entailed by the transition from the event to jets 2. Avoidance of ad hoc choices not fortified by strictly
(Sec. 5.17). analytical arguments.

The synthesis of OJD 11.1 3. The requirement that the jet configuration must in-
herit maximum information from the event.
It is rather remarkable that OJD turns out to be a smooth 4. Conformance to the Snowmass conventions in re-
blend of many things and tricks tried in the practice of jet algo- gard of kinematics of hadron collisions.
rithms. 5. Maximal computational simplicity.
We have already noted that it is essentially a cone algorithm
rewritten in terms of thrust-like shape observables. It even al- A remarkable fact is that other properties usually postulated
for jet algorithms emerge as mere consequences of the re-


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 43 of 45



quirement of computational simplicity, which fact provides Remarks on implementation 11.4
their ultimate justification thus replacing the usual aesthetic
arguments: That OJD (summarized in Sec. 7.16) is fully constructive is
in itself rather wonderful given that it was derived in a
6. Energy-momentum conservation in the formation of straightforward fashion from the seemingly innocent (to the
jets from particles (Sec.6.15, 7.3). point of appearing meaningless) criterion 5.9 which, however,
7. Conformance to relativistic kinematics (Sec.7.7). only accurately expresses a fundamental idea implicit in the jet
8. Maximal inclusiveness of the criterion (needed to re- paradigm -- that the configuration of jets inherits the essential
duce sensitivity to hadronization effects which corre- physical information of the corresponding event (5.8).
spond to higher order logarithmic and power correc- Computationally, the problem of finding jets in our formu-
tions). lation reduces to finding the recombination matrix z a j which
Lastly -- and most surprisingly -- the found criterion pos- minimizes [P, Q] given by 6.27. For an event which lit up
sesses a property which is naturally interpreted as 150 detector cells and contains 4 jets, z a j has 600 independent
components, so that one has an optimization problem in a do-
9. An optimal inversion of hadronization (Sec.5.10). main of a very large dimensionality. Such problems are notori-
ously difficult. Fortunately, the analytical simplicity of both the
Mutable and immutable elements of OJD 11.3 function to be minimized and the regularity of the domain in
which the minimum is to be found (a direct product of standard
One has to distinguish between the handling of a single simplices, one simplex per particle; cf. 6.3, 6.4) can be effec-
event and the construction of observables for collections of tively employed to design an efficient algorithm [7].
events. Although the minimization algorithm of [7] was obtained
At the level of an individual event, all the arbitrariness is in from purely analytical considerations (a variant of the gradient
the form of . There is not much room left for modifications of search which makes a heavy use of the analytical specifics of
the as given by 6.27 7.10 6.8. The internal structure of the problem) plus some experimentation, a posteriori it is natu-
and soft does not seem to allow meaningful modifications. rally interpreted as follows:
So the only reasonable option might have been in how and -- the algorithm starts with some (perhaps randomly gener-
soft are combined into a single quantity; it is represented by ated) distribution of particles between jets;
Eq. 8.26. It results in only marginal computational complica- -- the jets perform iterative "negotiations" by considering par-
tions but seems to make theoretical analyses more difficult ticles one by one and deciding if and how their energy should
without offering clear advantages (see the discussion in be redistributed between the jets and the soft energy in order to
Sec. 8.25). improve upon the current configuration;
At the level of collections of events things are more inter-
-- the algorithm stops when no particle can be further redis-
esting. In particular, making cut depend on the event P is the tributed to decrease .
way to include dynamical QCD information into the picture
(Sec. 5.17). However, lifting the restrictions of the standard This is reminiscent of the iterative adjustment of jets' posi-
scheme 4.38 (cf. Section 9) may be more important in the end tions in the cone algorithms. However, the jets' axes and
(cf. the remarks after Eq. 5.33). shapes are specified in the conventional algorithms directly,
and in the optimal criterion, indirectly via the recombination
To summarize: the form of R (6.27 7.10 6.8) is the matrix.
least mutable element of the described scheme, so that all
variations would use as the main building block a minimization Feasibility of implementation of OJD is thus not an issue.
procedure for (such a procedure is provided in the code de-  A liberating consequence of the jet definition via minimiza-
veloped in [7]). tion of a simple function is that a specific implementation of
The most interesting variations (i.e. those which allow im- the minimum-finding algorithm is of no consequence whatever
provements of the conventional scheme 4.38 in the direction of (physical or other) provided it yields the optimal jet configura-
constructing better approximations of the ideal optimal observ- tion with required precision. Thus different groups of physi-
ables 4.20) concern the definition of observables on collection cists are free to explore their favorite algorithms -- from sim-
of events (see Section 9 for tricks to start from). plest low-overhead methods for theoretical computations with a
The simplest universal (i.e. dynamics-agnostic) optimal jet few partons, to neural, genetic, Danzig's [36], equidomoidal
definition is based on the linear choice 6.27 (B = 1 in 8.26) and [37], ... algorithms for experimental data processing -- as long
an event-independent as they minimize the same criterion and control approximation
cut. This is closely related to the way
the conventional cone algorithms are defined and may be ac- errors sufficiently well in doing so. This would be a truly satis-
cepted, in the context of the developed theory, as a default factory way to resolve the difficulties encountered in compari-
definition for all comparisons. son of physical results from groups which use different variants
of jet algorithms [2].
In short, the theory of OJD only deals with the function to
be minimized in order to find jets (the fuzziness , 6.27 7.10 The criterion 6.27 tends to prefer configurations with z a j
6.8) -- but it only provides guiding principles (the method equal to exactly 0 or 1 (remark (ii) at the beginning of
of quasi-optimal observables; Sec. 2.25) for how observables Sec. 9.1). This makes the problem very similar to that of linear
are to be constructed. It is up to the user to decide whether to programming for which a vast theory exists (see e.g. [36])
stick to the conventional scheme 4.38 or go beyond its limita- where one can borrow ideas for more efficient or fancy imple-
tions using e.g. the tricks of Section 9. mentations.

Note that allowing fractional values for z a j proves to be ex-
tremely convenient algorithmically: the domain in which the
minimum is to be found is then a convex region, so one


F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 44 of 45



chooses an internal point (which corresponds to some frac- Furthermore, OJD offers new options for improving upon
tional values) as a starting point for minimum search and then the conventional jet-based data-processing scheme 4.38 as de-
descends into a minimum via a most direct route. scribed in Section 9 in the direction of approaching the theo-
retical Rao-Cramer limit on precision of extracted fundamental
New options for jet-based data processing parameters (see Sec. 2.19).
and ancillary results 11.5
The simplest dynamics-agnostic OJD allows modifications
The theory of OJD offers new options for improvements to incorporate dynamical QCD information (Sec. 5.31).
upon the conventional scheme 4.38. Some such options are de- A fast and robust implementation of OJD is available as a
scribed in Section 9. Also the additional information about Fortran code [7].
events contained in the parameters j and soft can be used to In conclusion, the most important result of this paper is a
expand the phase space of jet configurations in order to en- systematic analytical theory of jet definition based on first
hance informativeness of the resulting observables and thus principles, with explicitly formulated assumptions, and with
approach the theoretical Rao-Cramer limit of the optimal ob- the logic of jet definition elucidated in a formulaic fashion.
servables 4.20. If one were to construct as precise as possible an approxima-
Also the following results deserve to be mentioned here: tion to the optimal observable in a specific application then it
(i) The usefulness of the method of quasi-optimal observ- is a theorem that OJD is a better tool for that than any conven-
ables (Sec. 2.25) goes beyond jet-related measurements. tional jet algorithm. However, it is not clear whether the cost
of such an ideal solution would be justified by the resulting in-
(ii) The soft distribution (Sec. 8.19) offers a new model- crease in precision of results.
independent window on the dynamics of hadronization thus
allowing a new class of tests of pQCD as well as theoretical In any event, the developed systematic framework reveals
descriptions of hadronization models. some new options (e.g. the regularization via multiple jet con-
figurations) which may be useful even within the conventional
Advantages of OJD 11.6 approach.

OJD -- even interpreted narrowly in the context of the con- AC KNOWLEDGMENTS . A NORDITA secretary who failed
ventional scheme 4.38 -- has the following advantages over to submit an earlier text [6] to a journal thus gave me an op-
the conventional algorithms: portunity to revisit the subject and discover the trick with the
(i) OJD solves the problem of non-uniqueness of jet configu- recombination matrix. Ya.I. Azimov, E.E . Boos, and
rations which is insurmountable in the context of conventional I.F. Ginzburg offered their insights on the issue of quark-
schemes. It thus eliminates a source of errors entirely due to hadron duality. Walter Giele informed me of the jet definition
the structure of jet algorithms (Sec. 10.22). activities at FNAL and the related Web sites. The collaboration
(ii) OJD extirpates the difficulties of conventional algorithms with Dima Grigoriev on the minimum search algorithm [7] led
usually "solved" via ad hoc prescriptions (the handling of cone to a revision of an imprecise treatment of soft energy in the
overlaps, the choice of order of recombinations, etc.). first posting of this paper. Pablo Achard ran important large
scale tests of the algorithm on real data and provided valuable
(iii) The shape observables on which OJD is based generalize comments. Monique Werlen offered useful criticisms of a draft
the well-known thrust and are therefore superbly amenable to of this paper. Denis Perret-Gallix provided an encouragement
theoretical studies -- evidently more so than any imaginable and supplied information about the on-going discussions of jet
modification of the conventional schemes (cf. the pQCD cal- definition. Dima Bardin helped to clarify the bibliographic
culations for the thrust reviewed in [20]). status of the concept of optimal observables. I thank all the
(iv) OJD allows independent implementations so that differ- above people for their valuable inputs.
ent experimental and theoretical groups only have to agree This work was supported in part by the Russian Foundation
upon the function to be minimized (Sec. 11.4). for Basic Research under grant 99-02-18365.




F.V.Tkachov  [2nd ed.] A theory of jet definition 2000-Jan-10 02:44 Page 45 of 45



References

[1] R. Barlow, Rep. Prog. Phys. 56 (1993) 1067. [17] G. Sterman, Phys. Rev. D19 (1979) 3135; G. Tiktopoulos,
[2] R.Hirosky, talk at the Run II QCD Workshop, FNAL, Nucl. Phys. B147 (1979) 371.
4 March 1999 (http://www-d0.fnal.gov/~hirosky/talks [18] W. Rudin, Functional analysis. McGraw-Hill, 1973.
/qcd_workshop_jets.pdf). [19] N.N. Bogoliubov and D.V. Shirkov, Introduction to the
[3] F.V. Tkachov, Phys. Rev. Lett. 73 (1994) 2405; Erratum, Theory of Quantized Fields. FIZMATGIZ, Moscow,
74 (1995) 2618 . 1957; N.N. Bogoliubov and N.N. Bogoliubov (jr.), Intro-
[4] F.V. Tkachov, Int. J. Mod. Phys. A12 (1997) 5411 [hep- duction to Quantum Statistical Mechanics. NAUKA,
. Moscow, 1984; S. Weinberg, The Quantum Theory of
[5] N.A. Sveshnikov and F.V. Tkachov, Phys. Lett. B382 Fields, vol. 1 Foundations. Cambridge Univ. Press, 1995.
(1996) 403 . [20] M.H. Seymour, in: Physics at LEP2, vol. 1, Eds.:
[6] F.V. Tkachov, An optimal jet algorithm, preprint G. Altarelli, T. Sjstrand, and F. Zwirner, CERN 96-01.
NORDITA-95/14P (1995). [21] M.A. Shifman, A.I. Vainshtein, and V.I. Zakharov, Nucl.
[7] D.Yu. Grigoriev and F.V. Tkachov, An efficient imple- Phys. B147 (1979) 385.
mentation of the optimal jet definition, talk at the Int. [22] F.V. Tkachov, Phys. Part. Nucl. 25 (1994) 649 [hep-
Workshop QFTHEP'99, Moscow, 27 May - 2 June, 1999 .
. [23] F.V. Tkachov, Phys. Lett. B412 (1997) 350 [hep-
[8] G. Sterman and S.Weinberg, Phys. Rev. Lett. 39 (1977) .
1436. [24] R.K. Ellis, M.A. Furman, H.E. Haber, and I. Hinchliffe,
[9] W.T. Eadie et al., Statistical methods in experimental Nucl. Phys. B173 (1980) 397.
physics. North-Holland, 1971. [25] A.Y. Kamenschik and N.A. Sveshnikov, Phys. Lett. 123B
[10] A.A.Borovkov, Mathematical statistics. Parameter esti- (1983) 255.
mation and tests of hypotheses. NAUKA: Moscow, 1984 [26] Yu.L. Dokshitser, talk at The 1999 Winter School on
(in Russian). Particle Physics, St. Petersburg Nuclear Physics Institute,
[11] D.K. Kahaner and B. Wells, ACM Trans. Math. Software Gatchina, February 1999.
5 (1979) 86; [27] S. Catani et al., Nucl. Phys. B406 (1993) 187.
G.I. Manankova, A.F. Tatarchenko, and F.V. Tkachov, [28] S. Bethke, in: QCD20 Years Later. World Scientific,
talk at New Computing Techniques in Physics Research 1993.
IV (AIHENP'95). Pisa, Italy. April 3-8 1995; [29] D.E. Soper, Lectures at the SLAC Summer Institute,
FERMILAB-Conf-95/213-T; August 1996 .
S.Jadach . [30] T. Sjstrand, Comp. Phys. Comm. 28 (1983) 229.
[12] R.T. Ogden, Essential Wavelets for Statistical Applica- [31] JADE collaboration (W. Bartel et al.), Z. Phys. C33
tions and Data Analysis. Birkhuser, 1997. (1986) 23.
[13] D. Bardin and G. Passarino, CERN-TH/98-92 [hep- [32] S. Catani et al., Phys. Lett. 269B (1991) 432.
. [33] S. Bethke et al., Nucl. Phys. B370 (1992) 310.
[14] A.N. Tikhonov and V.Y. Arsenin, Methods of Solving Ill- [34] F. Dydak, talk at The IX Int. Workshop on High Energy
Posed Problems. NAUKA, Moscow, 1986. Physics. Zvenigorod, Russia. 16-22 Sept. 1994.
[15] W.T. Giele and E.W.N.Glover, preprint FERMILAB- [35] S. Youssef, Comp. Phys. Comm. 45 (1987) 423.
PUB-97/43-T . [36] C. Papadimitriou and K. Steiglitz, Combinatorial Optimi-
[16] F.V. Tkachov, Phys. Lett. 125B (1983) 85 [hep- zation Algorithms and Complexity. Prentice-Hall, 1982.
 and the remarks appended thereto]. [37] R. Queneau, Bords, Hermann, 1963.
s



