| |
Tightening the PGP Web of Trust with "same person" statements
OpenFortress is investigating PGP's web of trust. One idea that occurred
to us is that it might be advantageous for the Web of Trust if keys of the
same owner can be considered as each other's replacements.
Matching user identities and cross-signatures are insufficient means to
derive this, but alternative implementations of "can sign on my behalf" exist.
One implementation would be a globally acknowledged signing policy, another
would be the interpretation of a trust signature that goes one level deep by
signature validating software.
There could be several advantages to this "can sign on my behalf" knowledge.
Tightening the PGP Web of Trust.
Software that validates trust paths usually constrains the depth at which
it is willing to dive into the web of trust. If more than (say) three
intermediate signers are needed to find a path to someone's key, it is
considered less reliable than a path with only one intermediate signer.
It is probably fair to not count a self-acclaimed "can sign on my behalf"
signature as a step on a path. This would make the average path length
in the Web of Trust a lower number.
Changing keys not a drastic sacrifice.
If the old and new key cross-sign with a "can sign on my behalf"
property, they can be treated as equivalents by PGP validating software.
This is useful because it makes the sacrafice less when changing
signing keys.
If the old key signs with the "can sign on my behalf"
propery on the new key, it means that the new key can replace the old one,
so all trust in the old key passes on to the new key. If the property is
used the other way, it means that everything trusted with the old key is
also trusted with the new key -- and if there are security grounds to
drop use of the old key, that may not be the desirable property.
Different grades of keys are easier to maintain.
Some have keys that they use to sign at different levels of confidence.
For example, one at work and one at home. The sloppier key could
indicate that the tighter key can sign on its behalf, but not vice
versa. This seems to solve user wishes for (at least) PGP.
Interoperability between technologies.
If keys can be used as equivalents, there may also be an option for
doing so between technologies, such as an X.509 key that is
incorporated into a PGP system. This is far from trivial and will not
be worked out in this document.
This document investigates only the first point, how much the web of trust tightens from this approach. It is rather raw in doing so, as the only interest at this time is to come to an estimate. Any additional input to the calculations below is kindly appreciated.
Download the wotsap database to a file:
lynx -source > wot.2005-01-27.wot \
http://www.lysator.liu.se/~jc/wotsap/wots/2005-01-27.wot
(10 sec, 907724 bytes)
Turn wotsap database into human-readable trust listing:
./wotsap --wot=wot.2005-01-27.wot -p >all.2005-01-27.txt
(10 min, 13826676 bytes in 251279 lines: 25082 keys + 226197 sigs)
Remove distinction signature/signer and make key occurrences uniqe:
sed 's/^ . 0x/0x/' <all.2005-01-27.txt \
| sort \
| uniq >key.2005-01-27.txt
(30 sec, 1300475 bytes in 25082 lines)
Remove everything but the personal name; turn it into lowercase:
sed < key.2005-01-27.txt > nom.2005-01-27.txt \
-e 's/^0x[0-9A-Fa-f]* *//' \
-e 's/<[^<>]*>//' \
-e 's/([^()]*)//' \
-e 's/^ *//' \
-e 's/ *$//' \
-e 's/ */ /' \
-e 's/"//' \
-e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
(30 sec, 397581 bytes in 25082 lines: matches #keys)
Remove doubles and count the results:
cat nom.2005-01-27.txt | sort | uniq > nuq.2005-01-27.txt
(1 sec, 341689 bytes in 21439 lines)
Apperantly, there are 21439 unique name strings.
Conclusions (part 1)
The number of keys in the WoT is 25082
A rough estimate of the number of unique names is 21439 -
-----
The number of keys with a previously found name is 3643 (15%)
This number is much lower than expected. Causes:
- Wotsap only lists primary UIDs.
This means that keys with multiple names are not matched.
Even if the other keys are actually the same.
- No conclusion is drawn from sharing email addresses.
- Names can often be spelled in a variety of ways, beyond sed!
For example, middle names and initials may not stand out.
Also, there may be long/official versions versus calling names.
Calculations for email addresses:
sed < key.2005-01-27.txt > eml.2005-01-27.txt \
-e 's/^[^<]*</</' \
-e 's/>[^>]*$/>/' \
-e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
(10 sec, 582412 bytes in 25082 lines)
cat eml.2005-01-27.txt | sort | uniq > euq.2005-01-27.txt
(2 sec, 554168 bytes in 23791 lines)
Apperantly, there are 23791 unique email addresses.
Conclusions (part 2)
The number of keys in the WoT is 25082
A rough estimate of the number of unique emails is 23791 -
-----
The number of keys with a previously found email is 1291 (5%)
This is less than name matches. Well, probably there are many
people with one name and multiple email address, so that makes
sense to me.
Worst case, the number of matches can be estimated (assuming that
no names overlap in the WoT) at 3643 (15%) and best case it can
be that people use either a different name or a different email
address but not both, in which case there would be an overlap
of 3643 + 1291 = 4934 (20%) UIDs.
The latter measure (4934, 20%) should be cleaned up, that is those
keys that have the same email address as name should be subtracted
once because they were counted twice.
Correcting for name/email overlap in estimates:
sed < key.2005-01-27.txt > nml.2005-01-27.txt \
-e 's/^0x[0-9A-Fa-f]* *//' \
-e 's/([^()]*)//' \
-e 's/^ *//' \
-e 's/ *$//' \
-e 's/ */ /' \
-e 's/"//' \
-e 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
# excluded rewrite pattern:
# 's/<[^<>]*>//'
(10 sec, 962762 bytes in 25082 lines)
cat nml.2005-01-27.txt | sort | uniq > nmq.2005-01-27.txt
(2 sec, 928268 bytes in 24153 lines)
Apperantly, there are 24153 unique combinations of email address
and name string.
The number of keys in the WoT is 25082
A rough estimate of the number of unique name&mail 24153 -
-----
The number of keys with a previously found name&mail 929
That must be subtracted from the previous line of conclusions:
Keys with previously found name and+or email 4934
Keys with previously found name and email 929 -
-----
Keys with previously found name or email 4005 (16%)
So, to summarise:
16% of the keys in the strong set of the PGP WoT
can be considered equivalent to another key.
This is under the following assumptions:
- that same name and/or email address implies person equivalence
- only good enough to continue with this estimate!
- actual process must include more formal checks
- consider using a global policy or a trust signature to make equivalence hard
- equivalence is key owner's interest => it stands a chance
- making this practical can lower the percentage found
- that nothing could be learnt from additional UIDs
- bare nonsense, purely dictated by the incoming data set
- making this practical can heighten the percentage found
- that name equivalence is a literal thing, which it isn't
- bare nonsense, williams and bills; middle names, soundexes, ...
- making this practical can heighten the percentage found
- that no keys outside the strong set are equivalent with keys within
- there may be keys on their way in (new keys, old owners)
- new paths may come into existence through keys outside the strong set
- making this practical can heighten the percentage found
The influence of treating keys as equivalents is that a signature on
one key can influence any other key equivalent with it. So, the
signatures can be seen as signatures on a set of equivalent keys, not
on a single key.
This means that users who replace their old key for a new one and who
recollect signatures can still rely on the signatures on their old key,
provided that they didn't expire it.
For the web of trust as a whole, it means that less keys can be
considered to carry the same number of keys. Without a good idea of
the error introduced be the assumption, we are going to assume that
the signatures are spread evenly over existing keys.
In the Web of Trust of 2005-01-27 these values applied:
Keys in the strong set: 25082
Signatures in the strong set: 226197
Average signatures per key: 9.018
Average MSD (measured from a graphic) 6.2
After correcting for the keys equivalent to other keys this
can be seen as the following 'corrected' situation:
Equivalent key sets in the strong set: 21077
Signatures in the strong set: 226197
Average signatures per equivalent key set: 10.731
Average MSD (estimated below): 5.7
One key can reach as many next-keys as it signs (except the
obvious self-sig, which was already excluded from the initial
all.2005-01-27.txt file). For the non-equivalent WoT this
would mean 9.018 keys, then on to 9.018^i in the i'th iteration.
The reach of an average key is 9.018 ^ 6.2 = 834994. This apperantly
is the number to cover the 25082 keys in the strong set. Note that
this number does indeed look like the reachable set which is by nature
a superset of the strong set. I am not sure what that means.
For 21077 keys, that number could be lower, probably linearly (?)
so the coverage would become 834994 * 21077 / 25082 = 701665.
The assumption of linearity isn't strongly founded in math, but
do note that its dimension is a number of nodes, as goes for the
size of the strong set.
Note:
The latter assumption is false if the total number is indeed
the reachable set -- because that is the same in the new
situation. The calculation below would then result in an
equivalence-corrected MSD of 5.745 -- under the threshold of
the figure reported above.
The average MSD can then be estimated with 10.731 ^ avgMSD = 701665,
which leads to MSD = ln (701665) / ln (10.731) = 5.67
Note that this improvement is achievable only under the foregoing
assumptions, and that actual benefits may vary. There is also a
human factor; people may not always want to state by way of policy
or trust signature that their keys are equivalent; although it is
to be expected that such individuals will have put different names
on their keys as well.
I haven't got a clue... any thoughts, please forward to me:
pub 1024D/89754606 2002-03-28 Rick van Rein <rick@openfortress.nl>
Key fingerprint = CD46 B5F2 E876 A5EE 9A85 1735 1411 A9C2 8975 4606
pub 1024D/F5BAD1CD 2001-06-25 Rick van Rein <rick@vanrein.org>
Key fingerprint = 52CA E7AD D0CF 9EED 0E89 07EB 5558 341A F5BA D1CD
This estimation was performed by Dr.ir. Rick van Rein,
employee of OpenFortress Digital signatures in the Netherlands.
General information about OpenFortress:
http://openfortress.nl
info@openfortress.nl
Posted on Sun, 06 Feb 2005, 16:58.
| |
|