|
Appendix C -- What Google's PageRanks Mean
1. Google's Official Description
Google has developed algorithms that automatically
calculate the "importance" of any given Web page using the
number of other pages on the Web that link to the page and the "importance"
of those other pages. While the operational details of these calculations
are a closely guarded trade secret, Google has provided a general description
of its methodology on its Website at this URL:
http://www.google.com/technology/
The reader is strongly advised to review
Google's official, albeit brief description. The
remaining sections of this appendix illustrate how the general ideas
described in Google's note might be implemented in highly simplified
circumstances. Needless to say, it takes a mighty blast of creativity
to leap from the trivial example described in the following paragraphs
to the formulation and implementation of a set of algorithms powerful
enough and efficient enough to make these calculations automatically
for all of the billions of pages on the Web! ... :-)
2. An Illustrative Example
Imagine a well-attended conference whose
main session features ten prominent panelists who are committed to providing
cogent answers to questions posed about important issues by the conference
attendees.
Furthermore, imagine that one of the panelists
is the President of the United States, that eight other panelists are
CEO's of Fortune 500 companies, and that the tenth panelist, Mr. Hector
X, is apparently known only to the President.
In order to assure that conference attendees
receive the most authoritative answers, the panel's efficient moderator,
Ms. Mini Bot, asks each of the panelists to point out the two most important
members of the group. She also informs them that they should feel free
to point to themselves.
- As Ms. Bot looks around the table, she
is not surprised to see that all of the panelists are pointing to
themselves with one hand, and that eight of them are pointing to the
President with their other hand. She is somewhat surprised to see
that two of the panelists are pointing to Mr. Steve J. However she
is astounded to see that the President is pointing to the mysterious
Mr. Hector X.
- The votes received by each panelist
appear in the second column of Table X (below). The table shows that
the President received 8 votes, Mr. J. received 3 votes, Mr. X received
2 votes, and everyone else only received one vote.
- Ms. Bot then asks each panelist to identify
two topics they would be prepared to address. These topics appear
in the last column of Table X. The Table shows that the top three
vote-getters are prepared to discuss world trade, disasters, and iPods.
- Although she has no idea who Mr. X is,
Ms. Bot is astute enough to realize that the President's vote should
be counted far more heavily than the single vote of any other panelist.
The President cast two votes, so Ms. Bot decides to divide the President's
vote count in half and award 4 votes to Mr. X, thereby giving him
an adjusted total of 5 votes -- as shown in the third column of the
table.
As a result of this adjustment, the panelists
are now sorted into four ranks: the President holds the highest rank
(4) with 8 votes; Mr .X holds the next rank (3) with 5 votes; and Mr.
J. is next rank (2) with 3 votes. Everyone else received only one vote,
so they are all relegated to the lowest rank (1).
Table X. Votes for Panelists
Name |
Votes Received |
Adjusted
Votes |
Rank |
Preferred
Topics |
| (President) George W. |
8 |
8 |
4 |
World trade, Disasters |
| Hector X. |
2 |
1 + 4 = 5 |
3 |
Disasters, iPods |
| Steve J. |
3 |
3 |
2 |
World trade, iPods |
| Others (7) |
1 each |
1 each |
1 |
misc. |
Ms. Bot now applies a Google-type strategy
to determine who should answer each query and in what order. Her decisions
are recorded in Table Y (below).
- World Trade ==> The President
(rank = 4) will speak first. Mr. X (rank = 3) is not prepared to discuss
this topics, so Mr. J (rank = 2) will speak second.
- Disasters ==> The President (rank
= 4) will speak first. Mr. X (rank = 3) will speak second.
- iPods ==> The President (rank
= 4) is not prepared to discuss this topic, so the first speaker will
be Mr. X (rank = 3). The second speaker on this topic will be Mr.
J -- who will probably take very careful notes on whatever the mysterious
Mr. X has to say ... :-)
Table Y. Order
of Responses by Ranks to Queries from Audience
Topics |
1st Speaker |
2nd Speaker |
World Trade |
President (4) |
Steve J. (2) |
Disasters |
President (4) |
"Hecky-Boy" (3) |
iPods |
"Hecky-Boy" (3) |
Steve J. (2) |
3. Some Real PageRanks Assigned to Real Home
Pages
Table Z (below) contains the same information
about the Home Pages of some prominent organizations in the information
technology sector, but it also contains numbers
in parentheses that indicate the number of pages that link to each of
the Home Pages. These numbers are returned by Google whenever it is
given a "link" command.
- For example, when given the command
"link:www.google.com" on 4/15/06, Google returned 3,050,000
as the number of pages that linked to its own Home Page. There were
164,000 pages linked to the W3C's Home Page, and 76,600 pages linked
to Apple's Home Page.
- The sites in each rank are ordered from
left to right according to the number of links they received.
- The reader should note that the maximum
number of links in each rank is larger than the maximum number in
all lower ranks; likewise the minimum number of links in each rank
is larger than the minimum number in lower ranks. In other words,
pages with more links tend to be found in the higher ranks.
- However there is considerable overlap in the ranges
of links received by different ranks. For example, Apple is in rank
10 with 76,600 links which is considerably lower than Yahoo's 463,000,
yet Yahoo is in rank 9. Presumably, many of the sites that linked
to Apple had higher PageRanks than those that linked to Yahoo.
- The PageRank received by any Web page may vary from
one month to the next, rising or falling. A page that is ranked 9 one
month may be ranked 8 the next. However, the fact that PageRanks represent
Google's distillation of the "votes" a page receives from
all of the other Websites in the world makes it unlikely that a page
would rise two or more ranks within a few weeks or fall by two or
more ranks. In other words, the changes are more likely to be fluctuations
rather than sudden surges or plummets -- unless the world learns that
something "very good" or "very bad" has happened
to the organization that owns the Website. ... :-)
All of the link numbers that appear in Table Z (below)
were obtained during mid-March 2006 from Google's toolbar in browsers
on workstations in the DLL's offices in Silver Spring, Maryland.
Table Z. PageRanks of Some Information
Technology
Sector Home Pages -- With Link Counts
Silver Spring, Md -- 4/15/06
PageRanks |
Organizations |
| 10 |
|
9 |
Yahoo (463,000),
MSN (252,000), Microsoft
(154,000), Sun
(95,700), IBM
(67,000), Hewlett-Packard
(45,700), AOL
(34,100), Intel
(31,900), Cisco
(25,100), Oracle (17,300),
Verisign (10,500), IETF
(5,150) |
8 |
Verizon
(69,800), Linux
(34,900), Dell (21,800), Symantec
(14,900), Novell (11,500),
Fujitsu (4,150), 3Com
(3,280), Nortel
(2,920), IANA (1,650), Blackboard
(1,420) |
7 |
Lenovo (22,300),
Nintendo
(5,740), Gateway
(5,660), Toshiba (3,950),
NCR (1,030), Desire2Learn
(106) |
Last updated:
Monday 27-Mar-2006 12:18 PM
|