Tuesday, April 2, 2013

Binary Indexed Tree- Fenwick Tree




INTRODUCTION

The Binary Indexed Tree or the Fenwick Tree as it is alternatively known is a very useful data structure that has wide uses in various day to day applications. BIT is very useful for solving queries of a particular type, Let us suppose we have an array of elements and we need to do two types of queries frequently.
  1. Change the value of any element in the list.
  2. Get the sum of all the elements in any range within the given list
The naive solution has time complexity of O(1) for query 1 and O(n) for query 2. Suppose we make m queries. The worst case (when all queries are 2) has time complexity O(n * m). Binary Indexed Trees are easy to code and have worst time complexity O(m log n). i.e each query of the type 2 takes at most O(log n)


Now, Let us consider the elements of an array as frequencies e.g. arr[1]=5,arr[2]=6 etc, They can have any values, the term frequency is just meant to use as a notation.

Let us consider an initial array of size N(can be any valid size within Integer Range).We can easily construct another array of similar size N with values which is used to hold the cumulative values e.g.




Responsibility

The underlying working principle of the Binary Indexed Tree is achieved by the help of responsibility array. It’s an array of size>=N size where each index holds some specific values (called the responsibility sum).
Let us declare int res[N];

Basic Idea

We know that each integer can be represented by sum of Powers of 2  e.g.
  • 9=8+1   ----------> 1001(Binary Notation)
  • 37=32+4+1 ----------> 00100101(Binary Notation)
  • 15=8+4+2+1 ----------> 1111(Binary Notation)
We can see that the maximum number of 1 in the binary notation of 15 is ceil(log215)=4 ,similarly for 37 and 9.
i.e for 37 ceil(log237)=6 
    for 9 ceil(log29)=4

So, number of digits of any number in 2's power form would be log2N Presto, this is what exactly the property being made use in BIT.

Similarly, instead of storing the cumulative frequencies for the entire array we can store the (sum of some sub frequencies(Not the entire values from 0 to that index)) in particular indexes, the purpose of which will be a bit clear in a short while.

  • Let idx be some index(not value) of the array of size N (therefore,0<=idx<=N)
  • Let r be the position of the last occurrence of 1 in the binary notation of idx from left  to right.(See the example below) (therefore, 1<=r<=log2N (
The responsibility range is the range of (idx-2^r+1)    to   idx (inclusive). And the responsibility array contains the sum of all the index in the frequency array corresponding to the range
e.g.

Let us see an example here,let idx=12,(1100),hence r=2 and thus res[12] ,  the range of indexes of the frequency array covered by it is , [12-(2^2)+1 to 12] i.e [9-12] hence ,
res[12]=(frequency[9]+frequency[10]+frequency[11]+frequency[12])





Reading Cumulative value at any position. 

 In order to read the cumulative value at any index e.g  13 we just need to remove the last one bit(i.e r) each time from the idx till the value is >0
e.g.

13th index(1101) = res[13]+res[12]+res[8 ]
                  1101 + 1100    +1000(here we are removing the last bit having  1 at each iteration)
We can also check from the above table e.g. cumulative value of 6th index (0110)=res[6]+res[4]=9+11=20
i.e 0110 =6
     0100 =4

Reading Logic


Now we need to find a fast way of finding the r at each step. This can be easily done using 
Val=(num & -num); where val is the value obtained after making rth bit 0 and num is the initial number

The proof is as follows:

 Let num be the integer whose last digit(means r) we want to isolate. In binary notation num can be represented as a1b, where a represents binary digits before the last digit and b represents zeroes after the last digit.

Integer  num is equal to (a1b)¯ + 1 = a¯0b¯ + 1.    b consists of all zeroes, so b¯ consists of all ones. Finally we have
-num = (a1b)¯ + 1 
= a¯0b¯ + 1 
= a¯0(0...0)¯ + 1
= a¯0(1...1) + 1 
= a¯1(0...0) 
= a¯1b.
Now, we can easily isolate the last digit, using bitwise operator AND (in C++, Java it is &) with num and -num:

           a1b
&      a¯1b
--------------------
= (0...0)1(0...0)

Below is the code for querying the cumulative value at any index.


The number of iterations in this function is number of  bits in idx, which is at most log N.
Time complexity: O(log N).
Code length: Up to ten lines.



So,in order to get the sum between say two points a,b, we do read(b)-read(a-1) [we have used a-1 to include the ath value as well] and for getting the actual position at any index b just read(b)-read(b-1)



Change frequency at some position and update tree

The concept is to update tree frequency at all indexes which are responsible for frequency whose value we are changing. In reading cumulative frequency at some index, we were removing the last bit and going on. In changing some frequency val in tree, we should increment value at the current index (the starting index is always the one whose frequency is changed) for val, add the last digit to index and go on while the index is less than or equal to N. Function in C.






Change frequency at some given range with the same value and update the array







When incrementing frequencies in range [start..end], we just increment difference at index start  and decrement difference at index end+1. This is also like saying: increment all frequencies in range [a..infinity] and decrement all frequencies in range [b+1..infinity].


Above function shows how to initialise the BIT in an array called res[] and N=number of elements + 1and arr is the initial frequency array

I hope the basic structure of BIT is now clear.For further details in BIT,you can click Topcoder Algorithm Tutorial





No comments:

Post a Comment