> P(
"#/00DTimes New RomanD )0Y0hDVerdanaw RomanD )0Y0h" DWingdingsRomanD )0Y0h0DTrebuchet MSanD )0Y0h"b$.
@n?" dd@ @@``8& &
`op@8g4LdLd )0ppp@<4!d!d00<4dddd00uʚ;2Nʚ;<4dddd0bZ___PPT9<4?O=VJHashing
Searching(Consider the problem of searching an array for a given value
If the array is not sorted, the search requires O(n) time
If the value isn t there, we need to search all n elements
If the value is there, we search n/2 elements on average
If the array is sorted, we can do a binary search
A binary search requires O(log n) time
About equally fast whether the element is found or not
It doesn t seem like we could do much better
How about an O(1), that is, constant time search?
We can do it if the array is organized in a particular way=:t2^m=:t2^?,HashingbSuppose we were to come up with a magic function that, given a value to search for, would tell us exactly where in the array to look
If it s in that location, it s in the array
If it s not in that location, it s not in the array
This function would have no other purpose
If we look at the function s inputs and outputs, they probably won t make sense
This function is called a hash function because it makes hash of its inputs
T``
'Example (ideal) hash functionSuppose our hash function gave us the following values:
hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("cantaloupe") = 7hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2:8 88
Why hash tables?We don t (usually) use hash tables just to see if something is there or not instead, we put key/value pairs into the table
We use a key to find a place in the table
The value holds the information we are actually interested in{h\ "5
Finding the hash function0How can we come up with this magic function?
In general, we cannotthere is no such magic function L
In a few specific cases, where all the possible values are known in advance, it has been possible to compute a perfect hash function
What is the next best thing?
A perfect hash function would tell us exactly where to look
In general, the best we can do is a function that tells us where to start looking!thZZZZf
Example imperfect hash functionSuppose our hash function gave us the following values:
hash("apple") = 5hash("watermelon") = 3hash("grapes") = 8hash("cantaloupe") = 7hash("kiwi") = 0hash("strawberry") = 9hash("mango") = 6hash("banana") = 2hash("honeydew") = 6T88
CollisionsFWhen two values hash to the same array location, this is called a collision
Collisions are normally treated as first come, first served the first value that hashes to the location gets it
We have to find something to do with the second and subsequent values that hash to this same location&$B
Handling collisions*What can we do when two different values attempt to occupy the same place in an array?
Solution #1: Search from there for an empty location
Can stop searching when we find the value or an empty location
Search must be endaround
Solution #2: Use a second hash function
...and a third, and a fourth, and a fifth, ...
Solution #3: Use the array location as the header of a linked list of values that hash to this location
All these solutions work, provided:
We use the same technique to add things to the array as we use to search for things in the arrayWZ5ZZZ(Z0ZhZ$ZaZW)+0\$"Insertion, ISuppose you want to add seagull to this hash table
Also suppose:
hashCode(seagull) = 143
table[143] is not empty
table[143] != seagull
table[144] is not empty
table[144] != seagull
table[145] is empty
Therefore, put seagull at location 145A'""
@ Searching, I Suppose you want to look up seagull in this hash table
Also suppose:
hashCode(seagull) = 143
table[143] is not empty
table[143] != seagull
table[144] is not empty
table[144] != seagull
table[145] is not empty
table[145] == seagull !
We found seagull at location 145EZZ!Z""
D
Searching, IISuppose you want to look up cow in this hash table
Also suppose:
hashCode(cow) = 144
table[144] is not empty
table[144] != cow
table[145] is not empty
table[145] != cow
table[146] is empty
If cow were in the table, we should have found it by now
Therefore, it isn t hereAZZRZ"
L@
Insertion, IISuppose you want to add hawk to this hash table
Also suppose
hashCode(hawk) = 143
table[143] is not empty
table[143] != hawk
table[144] is not empty
table[144] == hawk
hawk is already in the table, so do nothing=k,!(=Insertion, IIISuppose:
You want to add cardinal to this hash table
hashCode(cardinal) = 147
The last location is 148
147 and 148 are occupied
Solution:
Treat the table as circular; after 148 comes 0
Hence, cardinal goes in location 0 (or 1, or 2, or ...) w
g 2
6)5
ClusteringOne problem with the above technique is the tendency to form clusters
A cluster is a group of items not containing any open slots
The bigger a cluster gets, the more likely it is that new values will hash into the cluster, and make it ever bigger
Clusters cause efficiency to degrade
Here is a nonsolution: instead of stepping one ahead, step n locations ahead
The clusters are still there, they re just harder to see
Unless n and the table size are mutually prime, some table locations are never checkedlJ/@O
EfficiencyHash tables are actually surprisingly efficient
Until the table is about 70% full, the number of probes (places looked at in the table) is typically only 2 or 3
Sophisticated mathematical analysis is required to prove that the expected cost of inserting into a hash table, or looking something up in the hash table, is O(1)
Even if the table is nearly full (leading to long searches), efficiency is usually still quite high8amSolution #2: RehashingIn the event of a collision, another approach is to rehash: compute another hash function
Since we may need to rehash many times, we need an easily computable sequence of functions
Simple example: in the case of hashing Strings, we might take the previous hash code and add the length of the String to it
Probably better if the length of the string was not a component in computing the original hash function
Possibly better yet: add the length of the String plus the number of probes made so far
Problem: are we sure we will look at every location in the array?
Rehashing is a fairly uncommon approach, and we won t pursue it any further here
ZZ[ZZhZXZBZQZZ4 [hXBQSolution #3: Bucket hashingThe previous solutions used open hashing: all entries went into a flat (unstructured) array
Another solution is to make each array location the header of a linked list of values that hash to that location(ZThe hashCode function*(
lpublic int hashCode() is defined in Object
Like equals, the default implementation of hashCode just uses the address of the object probably not what you want for your own objects
You can override hashCode for your own objects
As you might expect, String overrides hashCode with a version appropriate for strings
Note that the supplied hashCode method does not know the size of your array you have to adjust the returned int value yourself f+?$!tCf< >M Writing your own hashCode method*!( A hashCode method must:
Return a value that is (or can be converted to) a legal array index
Always return the same value for the same input
It can t use random numbers, or the time of day
Return the same value for equal inputs
Must be consistent with your equals method
It does not need to return different values for different inputs
A good hashCode method should:
Be efficient to compute
Give a uniform distribution of array indices
Not assign similar numbers to similar input valuesLZtZ0Z'Z+Z`ZxZt08E0, KOther considerationsThe hash table might fill up; we need to be prepared for that
Not a problem for a bucket hash, of course
You cannot delete items from an open hash table
This would create empty slots that might prevent you from finding items that hash before the slot but end up after it
Again, not a problem for a bucket hash
Generally speaking, hash tables work best when the table size is a prime number`>+0P>+0PHash tables in JavaJava provides two classes, Hashtable and HashMap classes
Both are maps: they associate keys with values
Hashtable is synchronized; it can be accessed safely from multiple threads
Hashtable uses an open hash, and has a rehash method, to increase the size of the table
HashMap is newer, faster, and usually better, but it is not synchronized
HashMap uses a bucket hash, and has a remove method,XI4 B +Bt 8 B
NB!Hash table operationsBoth Hashtable and HashMap are in java.util
Both have noargument constructors, as well as constructors that take an integer table size
Both have methods:
public Object put(Object key, Object value)
(Returns the previous value for this key, or null)
public Object get(Object key)
public void clear()
public Set keySet()
Dynamically reflects changes in the hash table
...and many othersZ,Z3ZFZ/ZZ
o,F/P
EThe End ` ff33f` ffD3f` MMM` f3fD"Yf` f3f6f3f` 3f̙f>?" dd@,z?" dd@ " @ ` n?" dd@ @@``PR @ ` `p>>f^t(
t
t
6X "
T Click to edit Master title style!
!P
t
0 "<$
0
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
t
6 "``
>*
t
6x "`
@*
t
6y "`
@*T
t<h? ffD3f darkblueC @xw(
x
x
6 R "
T Click to edit Master title style!
!
x
0 " `
W#Click to edit Master subtitle style$
$
x
6x "``
>*
x
6\" "`
@*
x
6$ "`
@*T
x<h? ffD3f0zr
(
0l P
h
P*
0U
R*
d
c$ ?
0n
@
RClick to edit Master text styles
Second level
Third level
Fourth level
Fifth level!
S
6̃ `P
P*
6` `
R*
H
0h ? ̙338(
0@ P
>*
0p
@*
6 `P
>*
6ȭ `
@*H
0h ? ̙330*(
x
c$vx
r
Swx `
H
0h ? ̙33
P$(
r
St
r
S5t
H
0h ? ffD3f
`$(
r
Sذt
r
St
H
0h ? ffD3f/
p0o ( @i
0r
0 S@<t
r
0 St
Cl
0
,$D0
0
HXo?p
8kiwir
0
Bo?p
0
Ho?r
:banana
0
Hԥo?p
>
watermelonr
0
Bo?p0
0
Hao?p0 P
9apple
0
H(o?pP
p
9mango
0
Hto?pp
>
cantaloupe
0
H_o?p
:grapes
0
H+o?p
>
strawberry
0
<io
p
G0
1
2
3
4
5
6
7
8
9 2H
00h ? ffD3f(
h(
r
Snt
r
St
<l ``
``,$D0f
6o
<xGo
9robin
<Mo
@
;sparrow
<$o
@`
8hawk
<Uo
`
;seagullf
6o
<Wo
Mbluejay
<[o
7owl
6Yo
Z&. . .
141
142
143
144
145
146
147
148
' 2'fB
6Do
P
fB
6Dof
6o_
<wo_
>
robin info
<]o _@
@sparrow info
<Pzo@_`
= hawk info
<o` _
@seagull info
f
6o
_
<o_
Zbluejay info
<o_
<owl info fB
6DoP
fB
6Do_P`
6(op``
Fkey value 2H
0h ? ffD3f
4$( lb
4r
4 SLt
r
4 S܉t
H
40h ? ffD3f+
8k
(
8x
8 c$ht
h
x
8 c$ht
h
Cl
8
,$D0
8
Ho?p
8kiwir
8
Bo?p
8
HDo?r
:banana
8
HIo?p
>
watermelonr
8
Bo?p0
8
HD?o?p0 P
9apple
8
H,o?pP
p
9mango
8
Ho?pp
>
cantaloupe
8
HDio?p
:grapes
8
H .o?p
>
strawberry
8
<1o
p
G0
1
2
3
4
5
6
7
8
9 2
8
<4oW,$D
0
H" Now what? 2H
80h ? ffD3f
<$( b
<r
< S*t
r
< St
H
<0h ? ffD3f
($(
(r
( St
r
( Sh9tP
H
(0h ? ffD3f
L
D
@ (
@r
@ Sht
h
@ Set
<$
0h
l @
@@,$D0l
@
<oP@
@
B"oP@
7robin
@
BgoP @@
9sparrow
@
BoP@@`
6hawk
@
BoP` @
<
l
@
<oP
@
@
BoP@
Kbluejay
@
B0`oP@
5owl
@
<'oP
_+. . .
141
142
143
144
145
146
147
148
. . ., 2,lB
@
<DoPPplB
@
<Do@P@
@
H,ro`
,$D0
7seagullH
@0h ? ffD3f
p
h

(
x
 c$t
 c$ܾt
<$
0
8 @
@a @
@,$D0f

6oP@

<,NoP@
7robin

<PoP @@
9sparrow

<lToP@@`
6hawk

<NoP` @
<
f

6oP
@

<xZoP@
Kbluejay

<^oP@
5owl

6aoP
_+. . .
141
142
143
144
145
146
147
148
. . ., 2,fB

6DoPPpfB

6Do@P@

HHeo`
,$D0
7seagullH
0h ? ffD3f
p
h
(
x
c$It
c$Jt
<$
0
8 @
@a @
@,$D0f
6oP@
<=oP@
7robin
<$?$@$B$C$D$F$G$H$J$K$L$M$N$O$&&Q$&00&$0::0S$:DD:T$DNND$NXXNU$XbbXV$bllbW$lvvlX$vvY$Z$$[$\$]$$^$$_$$`$$a$$b$ $ ** $*44*c$4>>4$>HH>$HRRHd$R\\R$\ff\$fppf$pzzp$zze$$$$$f$&&QyHj}w@(
_QwZw0 @Times New RomanQwZw0 f.2
)~Hashing*."System
0&TNPP &՜.+,0
sOnscreen ShowCETSreeB Times New RomanVerdana
Wingdings
Trebuchet MS
darkblueHashing
SearchingHashingExample (ideal) hash functionWhy hash tables?Finding the hash function Example imperfect hash functionCollisionsHandling collisions
Insertion, I
Searching, ISearching, IIInsertion, IIInsertion, IIIClusteringEfficiencySolution #2: RehashingSolution #3: Bucket hashingThe hashCode function!Writing your own hashCode methodOther considerationsHash tables in JavaHash table operationsThe EndFonts UsedDesign Template
Slide Titles)_David L. MatuszekDavid L. Matuszek
!"#$%&'()*+,./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwyz{}~Root EntrydO)Current UserSummaryInformation(xPowerPoint Document(DocumentSummaryInformation8