Efficient Counter in Java

系统 1596 0

Reference:  http://www.programcreek.com/2013/10/efficient-counter-in-java/

 
You may often need a counter to understand the frequency of something (e.g., words) from a database or text file. A counter can be easily implemented by using a HashMap in Java. This article compares different approaches to implement a counter. Finally, an efficient one will be concluded.

UPDATE: Check out  Java 8 counter , writing a counter is just 2 simple lines now.

1. The  Naive  Counter

Naively, it can be implemented as the following:

String s = "one two three two three three";String[] sArr = s.split(" "); //naive approach HashMap<String, Integer> counter = new HashMap<String, Integer>(); for (String a : sArr) { if (counter.containsKey(a)) { int oldValue = counter.get(a); counter.put(a, oldValue + 1); } else { counter.put(a, 1); }}

In each loop, you check if the key exists or not. If it does, increment the old value by 1, if not, set it to 1. This approach is simple and straightforward, but it is not the most efficient approach. This method is considered less efficient for the following reasons:

  • containsKey(), get() are called twice when a key already exists. That means searching the map twice.
  • Since Integer is immutable, each loop will create a new one for increment the old value

2. The  Better  Counter

Naturally we want a mutable integer to avoid creating many Integer objects. A mutable integer class can be defined as follows:

class MutableInteger {  private int val;  public MutableInteger(int val) { this.val = val; }  public int get() { return val; }  public void set(int val) { this.val = val; }  //used to print value convinently public String toString(){ return Integer.toString(val); }}

And the counter is improved and changed to the following:

HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>();  for (String a : sArr) { if (newCounter.containsKey(a)) { MutableInteger oldValue = newCounter.get(a); oldValue.set(oldValue.get() + 1); } else { newCounter.put(a, new MutableInteger(1)); }}

This seems better because it does not require creating many Integer objects any longer. However, the search is still twice in each loop if a key exists.

3. The  Efficient  Counter

The HashMap.put(key, value) method returns the key's current value. This is useful, because we can use the reference of the old value to update the value without searching one more time!

HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>(); for (String a : sArr) { MutableInteger initValue = new MutableInteger(1); MutableInteger oldValue = efficientCounter.put(a, initValue);  if(oldValue != null){ initValue.set(oldValue.get() + 1); }}

4. Performance Difference

To test the performance of the three different approaches, the following code is used. The performance test is on 1 million times. The raw results are as follows:

Naive Approach : 222796000Better Approach: 117283000Efficient Approach: 96374000

The difference is significant - 223 vs. 117 vs. 96. There is huge difference between  Naive  and  Better , which indicates that creating objects are expensive!

String s = "one two three two three three";String[] sArr = s.split(" "); long startTime = 0;long endTime = 0;long duration = 0; // naive approachstartTime = System.nanoTime();HashMap<String, Integer> counter = new HashMap<String, Integer>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { if (counter.containsKey(a)) { int oldValue = counter.get(a); counter.put(a, oldValue + 1); } else { counter.put(a, 1); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Naive Approach : " + duration); // better approachstartTime = System.nanoTime();HashMap<String, MutableInteger> newCounter = new HashMap<String, MutableInteger>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { if (newCounter.containsKey(a)) { MutableInteger oldValue = newCounter.get(a); oldValue.set(oldValue.get() + 1); } else { newCounter.put(a, new MutableInteger(1)); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Better Approach: " + duration); // efficient approachstartTime = System.nanoTime(); HashMap<String, MutableInteger> efficientCounter = new HashMap<String, MutableInteger>(); for (int i = 0; i < 1000000; i++) for (String a : sArr) { MutableInteger initValue = new MutableInteger(1); MutableInteger oldValue = efficientCounter.put(a, initValue);  if (oldValue != null) { initValue.set(oldValue.get() + 1); } } endTime = System.nanoTime();duration = endTime - startTime;System.out.println("Efficient Approach: " + duration);

When you use a counter, you probably also need a function to sort the map by value. You can check out  the frequently used method of HashMap .

5. Solutions from Keith

Added a couple tests:
1) Refactored "better approach" to just call get instead of containsKey. Usually, the elements you want are in the HashMap so that reduces from two searches to one.
2) Added a test with AtomicInteger, which michal mentioned.
3) Compared to singleton int array, which uses less memory according to http://amzn.com/0748614079

I ran the test program 3x and took the min to remove variance from other programs. Note that you can't do this within the program or the results are affected too much, probably due to gc.

Naive: 201716122Better Approach: 112259166Efficient Approach: 93066471Better Approach (without containsKey): 69578496Better Approach (without containsKey, with AtomicInteger): 94313287Better Approach (without containsKey, with int[]): 65877234

Better Approach (without containsKey):

HashMap<String, MutableInteger> efficientCounter2 = new HashMap<String, MutableInteger>();for (int i = 0; i < NUM_ITERATIONS; i++) { for (String a : sArr) { MutableInteger value = efficientCounter2.get(a);  if (value != null) { value.set(value.get() + 1); } else { efficientCounter2.put(a, new MutableInteger(1)); } }}

Better Approach (without containsKey, with AtomicInteger):

HashMap<String, AtomicInteger> atomicCounter = new HashMap<String, AtomicInteger>();for (int i = 0; i < NUM_ITERATIONS; i++) { for (String a : sArr) { AtomicInteger value = atomicCounter.get(a);  if (value != null) { value.incrementAndGet(); } else { atomicCounter.put(a, new AtomicInteger(1)); } }}

Better Approach (without containsKey, with int[]):

HashMap<String, int[]> intCounter = new HashMap<String, int[]>();

for (int i = 0; i < NUM_ITERATIONS; i++)

{ for (String a : sArr) { int[] valueWrapper = intCounter.get(a);  if (valueWrapper == null) { intCounter.put(a, new int[] { 1 }); } else { valueWrapper[0]++; } }}

Guava's MultiSet is probably faster still.

6. Conclusion

 

The winner is the last one which uses int arrays.

Efficient Counter in Java


更多文章、技术交流、商务合作、联系博主

微信扫码或搜索:z360901061

微信扫一扫加我为好友

QQ号联系: 360901061

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧,狠狠点击下面给点支持吧,站长非常感激您!手机微信长按不能支付解决办法:请将微信支付二维码保存到相册,切换到微信,然后点击微信右上角扫一扫功能,选择支付二维码完成支付。

【本文对您有帮助就好】

您的支持是博主写作最大的动力,如果您喜欢我的文章,感觉我的文章对您有帮助,请用微信扫描上面二维码支持博主2元、5元、10元、自定义金额等您想捐的金额吧,站长会非常 感谢您的哦!!!

发表我的评论
最新评论 总共0条评论