INTERNET-DRAFT A. Selcuk
draft-selcuk-probabilistic-lkh-01.txt C. McCubbin
Expires July 2000 D. Sidhu
UMBC MCTR
January 2001
Probabilistic Optimization of LKH-based Multicast Key Distribution Schemes
Status of this Memo
This document is an Internet-Draft and is NOT offered in accordance
with Section 10 of RFC2026, and the author does not provide the IETF
with any rights other than to publish as an Internet-Draft.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft
Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
Currently, the most efficient techniques for multicast key management
are based on the Logical Key Hierarchy (LKH) scheme. In current
practice, LKH trees are organized as balanced binary trees, which
give a uniform O(log n) complexity for compromise recovery in an n-
member group. In this draft, we propose organizing the LKH trees with
respect to the members' compromise probabilities instead of keeping a
balanced tree, in a spirit similar to data compression techniques
such as Huffman and Shannon-Fano coding. We describe two such
probabilistic algorithms which improve the performance of an LKH
scheme by probabilistic organization of the tree. Preliminary
simulation results show that significant gains can be obtained by
these techniques.
1. Introduction
Selcuk, McCubbin, Sidhu [Page 1]
INTERNET-DRAFT January 2000
One of the biggest challenges in multicast security is to maintain a
group key that is shared by all the group members and nobody else.
Such a group key can be used to provide secrecy and integrity
protection for the group communication. The problem of maintaining
the group key efficiently becomes a bigger challenge for larger
groups where members join and leave very frequently.
Currently, the most efficient methods for multicast key management
are based on the Logical Key Hierarchy (LKH) scheme of Wallner et al.
[WHA97]. In LKH, group members are organized as leaves of a tree with
logical internal nodes. The cost of a compromise recovery operation
in LKH is proportional to the depth of the compromised member in the
LKH tree. The original LKH scheme proposes maintaining a balanced
tree, which gives a uniform cost of O(log n) rekeys for compromise
recovery in an n-member group.
Performance of an LKH scheme can be improved by organizing the tree
with respect to compromise probabilities of members instead of
keeping a balanced tree, in a spirit similar to data compression
algorithms such as Huffman and Shannon-Fano coding. In this draft, we
describe two such probabilistic algorithms which improve the
performance of an LKH scheme by organizing the tree with respect to
member compromise probabilities. In Section 2, we describe the basic
LKH scheme. In Section 3, we discuss the problem of probabilistic LKH
optimization and its challenges. In Section 4, we describe the
design criteria for our algorithms. In Section 5, we describe the
algorithms. In Section 6, we discuss the problem of assigning
probabilities (or, "weights") to members in the group, and propose a
heuristic method for weight assignment. In Section 7, we summarize
some computer simulation results regarding the performance of the
algorithms.
Regarding the notation in the paper, log(...) denotes logarithm in
base 2, sum(...) denotes summation for i from 1 to n (i.e. all
summations in this draft are over the variable "i", for its values
from 1 to "n").
2. The LKH Scheme
In an LKH tree, there is a leaf node corresponding to each group
member. The internal nodes are "logical" entities which do not
correspond to any real-life entities in the multicast group, and are
only used for key distribution purposes. There is a key associated
with each node in the tree, and each member holds a copy of every key
on the path from its corresponding leaf node to the root of the tree.
Hence, the key corresponding to the root node is shared by all
members, and serves as the group key. An instance of an LKH tree is
shown in Figure 1.
Selcuk, McCubbin, Sidhu [Page 2]
INTERNET-DRAFT January 2000
Figure 1: An example LKH tree.
K_r
______________|________________
| |
| |
K_0 K_1
_______|_______ _______|_______
| | | |
| | | |
K_00 K_01 K_10 K_11
___|___ ___|___ ___|___ ___|___
| | | | | | | |
| | | | | | | |
K_000 K_001 K_010 K_011 K_100 K_101 K_110 K_111
M_1 M_2 M_3 M_4 M_5 M_6 M_7 M_8
In this figure, member M_1 holds a copy of the keys K_000, K_00, K_0,
and K_r; member M_2 holds a copy of K_001, K_00, K_0, and K_r; and so
on. In case of a compromise, the compromised keys are changed, and
the new keys are multicast to the group encrypted by their children
keys. For example, assume the keys of M_2 are compromised. First
K_001 is changed and sent to M_2 over a secure unicast channel. Then
K_00 is changed, two copies of the new key are encrypted by K_000 and
K_001, and sent to the group. Then K_0 is changed and sent to the
group, encrypted by K_00 and K_01; and finally K_r is changed and
sent to the group, encrypted by K_0 and K_1. From each encrypted
message, the new keys are extracted by the group members who have a
valid copy of one of the (child) encryption keys.
If the security policy requires backward and forward secrecy for
group communication (i.e. a new member should not be able to decrypt
the communication that took place before its joining, and a former
member should not be able to decrypt the communication that takes
place after its leaving) then the keys on the leaving/joining
member's path in the tree should be changed in a way similar to that
described above for compromise recovery.
Although the tree in Figure 1 is a binary tree, an LKH scheme can be
implemented with a tree of an arbitrary degree. Nevertheless, the
most efficient and practical implementations are obtained by a binary
tree. In a binary LKH tree implementation, the tree is always full
(i.e., every non-leaf node has exactly two children) and any node
with a single child (e.g., parent of a deleted leaf node) is always
removed and its parent and child are linked to each other. In this
draft, we follow the convention and focus our attention on binary
trees. We assume the deletion operation which removes a leaf node
Selcuk, McCubbin, Sidhu [Page 3]
INTERNET-DRAFT January 2000
also removes any node left with a single child, and therefore the
tree is always kept full.
3. Probabilistic LKH Optimization
A rekey operation in an LKH scheme can be caused by a join, leave, or
compromise event. A rekey operation in case of a join can be done
with a single multicast message [BMS99]. For a leave or compromise
event, the number of rekeys will be proportional to d_i, the depth of
the evicted/compromised member in the LKH tree. More specifically,
the number of rekeys is linear in d_i, as c1*d_i + c0. The values of
the coefficients c0 and c1 depend on the specifics of the LKH
implementation. For our purposes, the exact values of these
coefficients do not matter, and our results are applicable for any
values of them, given c1 > 0.
The original LKH scheme uses a balanced tree for key distribution,
where the depth of each member is log(n) in an n-member group. Hence,
the cost of each rekey operation is proportional to log(n). Instead
of keeping a balanced tree regardless of the membership dynamics,
performance of an LKH scheme can be improved by organizing the tree
with respect to the rekeying likelihood of members. That is, the
average rekey cost can be reduced by putting the more likely-to-rekey
members closer to the root, and moving less likely members further
down the tree.
The problem we address in this draft is how to reduce the average
rekey cost of an LKH scheme by organizing the members in the tree
with respect to their rekey probabilities. In general, finding the
optimal solution to this problem which minimizes average rekey cost
is quite an intractable task. The optimal solution would be the
choice of the sequence of operations that would minimize the cost
averaged over all possible join, leave, and compromise events in the
future, which depends on the probability distribution of these events
for all current and prospective members of the group. Even if we
assume that these probability distributions are known for all members
and non-members in the system, finding the optimal move at each step
that will minimize the average cost of all future rekey events is
quite an impossible task.
Instead, we restrict our attention to a more tractable problem,
namely minimizing the expected cost of the next rekey operation. The
expected cost of the next rekey operation (due to a leave or
compromise event) is equal to
sum(p_i * d_i)
Selcuk, McCubbin, Sidhu [Page 4]
INTERNET-DRAFT January 2000
where p_i is the probability that member M_i will be the next to be
evicted/compromised, and d_i is its depth in the tree. This problem
has many similarities to the data compression problem with code
trees, where the average code length per message is
sum(p_i * d_i),
which is known as the "average external path length" (AEPL) of the
tree, where p_i is the probability of message m_i to be the next to
appear, and d_i is its depth in the code tree. For the problem of
minimizing the AEPL, the optimal solution is given by Huffman trees
[BCW90]. Shannon-Fano trees are another alternative solution, which
give very good compression in practice, but are slightly sub-optimal
[BCW90].
3.1. Differences from Data Compression Trees
In an LKH key distribution tree, a change in the tree structure, such
as changing the location of an existing member in the tree, causes
extra rekey operations, which adversely affects the objective
function (i.e. minimizing the average number of rekeys). On the other
hand, in a data compression tree, a structural change does not
directly induce an overhead on the objective function (i.e.
minimizing the average code length). So, changes in the tree
structure can be freely utilized in data compression algorithms, such
as dynamic Huffman algorithms [Knuth85], to maintain the optimal tree
structure; whereas they cannot be utilized so freely in dynamic LKH
algorithms. Therefore, an LKH scheme with sub-optimal sum(p_i * d_i)
can have a better overall performance than one that maintains the
minimal sum(p_i * d_i) all the time.
Another difference of LKH trees from data compression trees is that,
when member evictions are the main reason for rekey operations (i.e.
few compromise events happen other than member evictions), then each
member in the tree will cause a single rekey operation while it is in
the tree.
4. Design Criteria
As mentioned above, finding the optimal solution that minimizes the
average number of rekey messages over all future sequence of join,
leave, compromise events is not possible in practice. Therefore, we
focus our attention on minimizing the expected cost of the next rekey
event, sum(p_i * d_i). The proven optimal solution for minimizing
sum(p_i * d_i) is given by a Huffman tree. However, maintaining a
Huffman tree requires changes in the locations of the existing
members in the tree, which means extra rekey operations. We choose to
avoid this kind of extra rekey operations, and we restrict our
Selcuk, McCubbin, Sidhu [Page 5]
INTERNET-DRAFT January 2000
attention to algorithms which do not require changing the location of
the existing members.
Given the condition that the location of existing members will not be
changed in the tree, the main structural decision for the tree
organization is where to put a new member at insertion time. Also,
the insertion operation should observe the current locations of
existing members. That is, the keys each member is holding after an
insertion operation should be the same as those it was holding before
the insertion (or the corresponding keys, for the keys that are
changed), plus possibly some newly added keys to the tree. Therefore,
we will focus our attention on insertion operations of the following
form, which is implied by the conditions above.
Figure 2: An insertion operation which keeps the relative
positions of the existing members the same.
Root Root
. .
. .
. Put(M,X) .
Y --------> Y
\ \
X N
. . / \
. . M X
. .
. .
That is, to insert a new member M into the group, a new internal node
N is inserted at somewhere in the tree, and M is linked underneath.
To denote this insertion operation at a given location X for a given
new member M, we will write Put(M, X). Note that the traditional LKH
insertion, where every new member is inserted as a sibling to a leaf
node, is a specific case of Put(M, X) where X is a leaf node.
In our probabilistic LKH trees, each node X in the tree has a
probability field X.p which is the cumulative probability of the
members in the subtree rooted at X (similar to as it is in Huffman
trees), i.e., X.p is equal to the probability of the corresponding
member if X is a leaf node, and it is equal to X.left.p + X.right.p
if X is an internal node. The "Put" procedure shown above should also
update the "p" field of all nodes affected by the insertion, as well
as setting up the appropriate links for M and N.
5. Insertion Algorithms
Selcuk, McCubbin, Sidhu [Page 6]
INTERNET-DRAFT January 2000
In this section, we describe two LKH insertion algorithms which seek
to minimize the expected number of rekeys for the next member
eviction or compromise. The first algorithm does not induce any
additional computational cost over the basic balanced-tree LKH
insertion. The second algorithm provides further improvement over the
first algorithm in message complexity, but induces an O(n)
computational overhead for an n-member group.
Algorithm 1:
The first algorithm, which we refer to as "Insert1", organizes the
LKH tree in a way which imitates the Shannon-Fano data compression
trees. In Shannon-Fano coding [BCW90], a tree is constructed from a
given set of probabilities by dividing the set into two parts of
(roughly) equal probability repeatedly until every set includes a
single element. Shannon-Fano coding guarantees a maximum redundancy
of 1; i.e. sum(p_i * d_i) <= sum(p_i * -log(p_i)) + 1, for
sum(p_i) = 1. Even though finding the best partition is NP-hard,
there are many partition heuristics that maintain the redundancy
bound of 1.
The principle of Insert1 is to insert a new node in a way which
obtains the best partitioning at every level so that the resulting
tree will have an AEPL close to the optimal bound of sum(p_i *
-log(p_i)). The algorithm works as follows:
Insert1(M, X):
--------------
if ( (M.p >= X.left.p) and (M.p >= X.right.p) )
Put(M, X);
else if (X.left.p >= X.right.p)
Insert1(M, X.right);
else
Insert1(M, X.left);
To insert M in a tree with root node "Root", the procedure is called
as "Insert1(M, Root)".
Algorithm 2:
The second algorithm, which we refer to as "Insert2", finds the best
insertion point for member M by searching all possible insertion
points in the tree. The amount of increase in AEPL that will be
caused by Put(M, X) for node X at depth d is equal to
Selcuk, McCubbin, Sidhu [Page 7]
INTERNET-DRAFT January 2000
(M.p * d) + X.p .
Insert2 searches the whole tree to find the location which minimizes
this quantity.
Insert2(M, Root):
-----------------
For every node X in the subtree under node Root, compute
Cost[X] = (M.p * d) + X.p;
Put(M, MIN_X), where MIN_X is the X that minimizes Cost[X];
Computational performance of Insert2 can be improved by taking
shortcuts in finding MIN_X. For example, when X.p <= M.p the subtree
under X need not be searched. More sophisticated shortcuts which
improve the performance further are also possible. In the worst case,
Cost[X] should be computed for all nodes in the tree. Nevertheless,
the formula for Cost[X] is quite simple, and can be computed quite
efficiently. So, when the computation power is plentiful compared to
the bandwidth, Insert2 can be the method of choice which obtains
improved reduction in number of rekeys at the expense of
computational cost.
6. Assignment of Probabilities
An important issue that should be resolved before utilizing the
algorithms described above is how to assign probabilities to members.
First, it should be noted that the tree techniques we mentioned
(Insert1, Insert2, Huffman, Shannon-Fano) work just the same whether
sum(p_i) = 1 or not. So, we can use any non-negative weights w_i
(which do not necessarily add up to 1) instead of probabilities p_i.
For some other purposes where actual probabilities are needed, such
as entropy and redundancy calculations, the probabilities can be
computed as
p_i = w_i / W
where W = sum(w_i).
As we discussed in the problem definition in Section 3, the quantity
we seek to minimize is the average cost of the next rekey operation,
sum(p_i * d_i). In this summation, p_i is the probability that member
M_i will be the next to rekey (by leaving or by compromise).
Computing the exact value of p_i depends on the probability
distribution of all the members' inter-rekey time, and is extremely
difficult in general. As a heuristic, we propose using
Selcuk, McCubbin, Sidhu [Page 8]
INTERNET-DRAFT January 2000
w_i = 1 / t_i
for the weight assignments, where t_i is the expected inter-rekey
time for member M_i. There are two reasons for our choice of 1/t_i as
the weight measure among infinitely many other candidates:
1. Its simplicity and convenience
2. In the special case where the members' inter-rekey times are
exponentially-distributed, p_i = w_i/W gives exactly the probability
that M_i will be the next to rekey.
Estimation of the expected inter-rekey times of the members is
outside the scope of this draft. Average of a member's past inter-
rekey times would be a good estimator if members have caused
sufficiently many rekey events in the group. If not, history of the
members from other similar groups can be used if this kind of
information is available.
7. Experimental Comparison of the Methods
We compared Insert1, Insert2, and basic balanced-tree LKH method with
computer simulations. In these simulations, various distributions
(normal, exponential, uniform) are assumed for the distribution of
rekey time of a member, and also for the distribution of the expected
rekey time t_i of different members. Inter-arrival time of new
members is always assumed to be distributed exponentially. For the
group sizes, we experimented with various values up to 100,000.
The simulations showed that significant gains can be obtained by
probabilistic organization of the LKH tree. In our experiments,
Insert1 performed 5-40% better than the balanced LKH in terms of the
average number of rekeys needed for compromise recovery. Insert2
provided a further 2-10% improvement over Insert1. More typical
values were 15-20% for Insert1, and 15-25% for Insert2 as the
improvement rate over the basic balanced scheme. A more complete
presentation of the experimental results will be given in future
versions of this draft.
Of course, it is possible to obtain arbitrarily high improvement
figures for the algorithms by increasing the variance in the
distribution of t_i values. The results we report here are for more
"realistic" choices of the distribution parameters. To test how these
probabilistic schemes will behave for real-life multicast groups,
empirical data on real-life member behaviors should be obtained. For
now, we conjecture that a 15-20% improvement rate over the basic
balanced LKH scheme would be realistic for fairly large groups (e.g.
for groups with more than 1000 members). The improvement rate should
be higher for larger and more diverse groups.
Selcuk, McCubbin, Sidhu [Page 9]
INTERNET-DRAFT January 2000
8. Conclusions
We showed two methods which improve the rekey message complexity cost
of the LKH method for multicast key management by probabilistic
organization of the LKH tree. Besides the basic LKH scheme of Wallner
et al. [WHA97], these probabilistic methods are also applicable to
more sophisticated LKH-based techniques such as the Balenson-McGrew-
Sherman scheme [BMS99] and the scheme of Canetti et al. [CGIMNP99].
Acknowledgments
We would like to thank Eric Harder for many helpful suggestions and
informative discussions.
References
[BCW90] T. C. Bell, J. G. Cleary and I. H. Witten, Text Compression,
Prentice-Hall, 1990.
[BMS99] D. Balenson, D. McGrew, and A. Sherman, Key Management for
Large Dynamic Groups: One-Way Function Trees and Amortized
Initialization, Internet draft draft-balenson-groupkeymgmt-oft-
01.txt, July 1999.
[CGIMNP99] R. Canetti, J. Garay, G. Itkis, D. Micciancio, M. Naor,
and B. Pinkas, Multicast Security: A Taxonomy and Some Efficient
Constructions, Infocomm'99 conference, 1999.
[Knuth85] D. E. Knuth, Dynamic Huffman Coding, Journal of
Algorithms, volume 6, pages 163-180, 1985.
[WHA97] D. Wallner, E. Harder, and R. Agee, Key Management for
Multicast: Issues and Architectures, Internet draft draft-wallner-
key-arch-00.txt, July 1997.
Authors' Addresses:
Ali A. Selcuk, Christopher B. McCubbin, Deepinder Sidhu
Maryland Center for Telecommunications Research
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250, USA
Email: {aselcu1, cmccub1, sidhu}@umbc.edu
Expires June 2000
Selcuk, McCubbin, Sidhu [Page 10]