AVL Trees

  • Target: Speed up searching (with insertion and deletion)
  • Tool: Binary search trees
  • Problem: Although Tp=O(height)T_p = O( \text{height} ), but the height can be as bad as O(N)O( N )

[Definition] An empty binary tree is height balanced. If TT is a nonempty binary tree with TLT_L and TRT_R as its left and right subtrees, then TT is height balanced if and only if

  • TLT_L and TRT_R are height balanced, and
  • hLhR1| h_L - h_R | \leq 1 where hLh_L and hRh_R are the heights of TLT_L and TRT_R , respectively.

Note: The height of an empty tree is defined to be -1, and the height of a single node is defined to be 0.

[Definition] The balance factor BF(node)=hLhRBF( node ) = h_L - h_R. In an AVL tree, BF(node)=1,0,or1BF( node ) = -1, 0, or 1.

Single Rotation

RR Rotation

image-20210302183310894

LL Rotation

Double Rotation

LR Rotation

image-20210302184604221

RL Rotation

image-20210302184631968

Conclusion

  • Single Rotation中RR对应一次逆时针,LL对应一次顺时针
  • Double Rotation中从后往前读,R对应一次逆时针,L对应一次顺时针
    • LR先逆时针再顺时针
    • RL先顺时针再逆时针

Note: Several bf’s might be changed even if we don’t need to reconstruct the tree.

image-20210511190639634

The correct answer is A.

The height of AVL Tree

image-20210302193145995 image-20210302193712135

Splay Trees

  • Target: Any MM consecutive tree operations starting from an empty tree take at most O(MlogN)O(M \log N) time.

Note: AVL Tree is a kind of Splay Tree because every operation takes O(logN)O(\log N) time.

  • Splay Tree only means that the amortized time is O(logN)O(\log N). A single operation might still take O(N)O(N) time.
  • The bound is weaker. But the effect is the same: There are no bad input sequences.
  • Whenever a node is accessed, it must be moved to avoid the case of successive operations taking O(N)O(N) time.
  • Idea: After a node is accessed, it is pushed to the root by a series of AVL tree rotations.

A simple solution

  • Only use single rotation

  • The time complexity of the worst case might be O(N2)O(N^2), so it doesn’t work.

Splaying

image-20210304160411876

Note: The operation order of one Zig-zig is from up to down.

  • Splaying not only moves the accessed node to the root, but also roughly halves the depth of most nodes on the path.
image-20210508163912256

Deletions

  • Step1: Find X and X will be at the root
  • Step2: Remove X and there will be two subtrees TLT_L and TRT_R
  • Step3: $FindMax ( T_L ) $ and the largest element will be the root of TLT_L , and has no right child
  • Step4: Make TRT_R the right child of the root of TLT_L

Amortized Analysis 摊还分析

  • Target: Any MM consecutive operations take at most O(MlogN)O(M \log N) time. — Amortized time bound
  • worst-case bound \geq amortized bound \geq average-case bound
  • Probability is not involved in amortized bound
  • 摊还分析可以保证最坏情况下每个操作的平均性能

Aggregate analysis 聚合分析

  • Idea: Show that for all nn, a sequence of nn operations takes worst-case time T(n)T(n) in total. In the worst case, the average cost, or amortized cost, per operation is therefore T(n)/nT(n)/n.
image-20210304171720165
  • We can pop each object from the stack at most once for each time we have pushed it onto the stack

Accounting method 核算法

  • Idea: When an operation’s amortized cost c^i\hat c_i exceeds its actual cost cic_i, we assign the difference to specific objects in the data structure as credit. Credit can help pay for later operations whose amortized cost is less than their actual cost.

  • The difference between aggregate analysis and accounting method is that the later one assumes that the amortized costs of the operations may differ from each other

  • For all sequences of nn operations, we have

    i=1nc^ii=1nci\sum^n_{i=1}\hat c_i\geq \sum^n_{i=1}c_i

    Tamortized=i=1nc^inT_{amortized}=\frac{\sum^n_{i=1}\hat c_i}{n}

image-20210304171631233

Potential method 势能法

  • Idea: Take a closer look at the credit : c^ici=Crediti=Φ(Di)Φ(Di1)\hat c_i-c_i=\text{Credit}_i=\Phi(D_i)-\Phi(D_{i-1})
  • DiD_i id defined to be the structure of the current situation
  • Potential function maps the current structure of the problem into a number

c^i=ci+Φ(Di)Φ(Di1)i=1nc^i=i=1n(ci+Φ(Di)Φ(Di1))=i=1nci+Φ(Dn)Φ(D0)\hat c_i=c_i+\Phi(D_i)-\Phi(D_{i-1})\\ \sum^n_{i=1}\hat c_i=\sum^n_{i=1}(c_i+\Phi(D_i)-\Phi(D_{i-1}))=\sum^n_{i=1}c_i+\Phi(D_n)-\Phi(D_0)

  • A good potential function should always assume its minimum at the start of the sequence
image-20210304173426195 image-20210508163951860

Splay Trees势能分析

  • 势能函数是树中所有节点的 rank 之和

    Φ(T)=i=1nR(i)=i=1nlogS(i)\Phi(T)=\sum^n_{i=1}R(i)=\sum^n_{i=1}\log S(i)

    其中S(i)S(i)指的是子树ii中的节点数(包括节点ii),用R(i)R(i)表示节点 ii 的 rank,R(i)=logS(i)R(i)=\log S(i)

  • R2R_2表示操作后的势能,R1R_1表示操作前势能

image-20210307165318849
  • zig
    • 实际成本是一次单旋,为1
    • 只有 XXPP 的 rank 值有变化,故c^i=1+R2(X)R1(X)+R2(P)R1(P)\hat c_i = 1 + R_2(X) − R_1(X) + R_2(P) − R_1(P)
    • 节点 PP 由根节点变为非根节点,故R2(P)R1(P)0R_2(P)-R_1(P)\leq0,因此c^i1+R2(X)R1(X)1+3(R2(X)R1(X))\hat c_i\leq 1+R_2(X)-R_1(X)\leq 1+3(R_2(X)-R_1(X))
  • zig-zag
    • 实际成本是两次旋转,为2

    • c^i=2+R2(X)R1(X)+R2(P)R1(P)+R2(G)R1(G)\hat c_i = 2 + R_2(X) − R_1(X) + R_2(P) − R_1(P)+R_2(G) − R_1(G)

    • 操作前GG是根节点,操作后XX是根节点,rank相同,故c^i=2R1(X)+R2(P)R1(P)+R2(G)\hat c_i = 2− R_1(X) + R_2(P) − R_1(P)+R_2(G)

    • R1(P)R1(X)R_1(P)\geq R_1(X)

    • S2(P)+S2(G)S2(X)S_2(P)+S_2(G)\leq S_2(X),根据定理可得R2(P)+R2(G)2R2(X)2R_2(P)+R_2(G)\leq 2R_2(X)-2

    • c^i=2R1(X)+R2(P)R1(P)+R2(G)2+R2(P)+R2(G)2R1(X)2+R2(P)+R2(G)2R1(X)(R2(P)+R2(G)2R2(X)+2)=2(R2(X)R1(X))\hat c_i = 2− R_1(X) + R_2(P) − R_1(P)+R_2(G)\leq 2 + R_2(P)+R_2(G)− 2R_1(X)\\ \leq 2 + R_2(P)+R_2(G)− 2R_1(X)-(R_2(P)+R_2(G)− 2R_2(X)+2)=2(R_2(X)-R_1(X))

  • zig-zig
    • 实际成本是两次单旋,为2
    • c^i=2+R2(X)R1(X)+R2(P)R1(P)+R2(G)R1(G)\hat c_i = 2 + R_2(X) − R_1(X) + R_2(P) − R_1(P)+R_2(G) − R_1(G)
    • 操作前GG是根节点,操作后XX是根节点,rank相同,故c^i=2R1(X)+R2(P)R1(P)+R2(G)\hat c_i = 2− R_1(X) + R_2(P) − R_1(P)+R_2(G)
    • R1(P)R1(X)R_1(P)\geq R_1(X)
    • R2(G)R2(P)R2(X)R_2(G)\leq R_2(P)\leq R_2(X)
    • c^i2(R2(X)R1(X))+23(R2(X)R1(X))\hat c_i\leq2(R_2(X)-R_1(X))+2\leq3(R_2(X)-R_1(X))
  • [Theorem] The amortized time to splay a tree with root TT at node XX is at most 3(R(T)R(X))+1=O(logN)3( R( T ) – R ( X ) ) + 1 = O(\log N).