Fisher–Yates shuffle 基本思想( Knuth shuffle ):
To shuffle an array a of n elements (indices 0..n-1): for i from n − 1 downto 1 do j ← random integer with 0 ≤ j ≤ i exchange a[j] and a[i]
JDK源代码如下:
/** * Moves every element of the List to a random new position in the list. * * @param list * the List to shuffle * * @throws UnsupportedOperationException * when replacing an element in the List is not supported */ public static void shuffle(List<?> list) { shuffle(list, new Random()); } /** * Moves every element of the List to a random new position in the list * using the specified random number generator. * * @param list * the List to shuffle * @param random * the random number generator * * @throws UnsupportedOperationException * when replacing an element in the List is not supported */ @SuppressWarnings( "unchecked" ) public static void shuffle(List<?> list, Random random) { if (!(list instanceof RandomAccess)) { Object[] array = list.toArray(); for ( int i = array.length - 1; i > 0; i-- ) { int index = random.nextInt(i + 1 ); if (index < 0 ) { index = - index; } Object temp = array[i]; array[i] = array[index]; array[index] = temp; } int i = 0 ; ListIterator <Object> it = (ListIterator<Object> ) list .listIterator(); while (it.hasNext()) { it.next(); it.set(array[i ++ ]); } } else { List <Object> rawList = (List<Object> ) list; for ( int i = rawList.size() - 1; i > 0; i-- ) { int index = random.nextInt(i + 1 ); if (index < 0 ) { index = - index; } rawList.set(index, rawList.set(i, rawList.get(index))); } } }
测试代码,为了确保每种情况的初始化一样,使用了多个容器:
public class javaShuffle { public static int temp = 0 ; public static long start; public static long end; public static void main( final String args[]) { Object changeTemp; List <Integer> numList = new ArrayList<Integer> (); List <Integer> firstList = new ArrayList<Integer> (); List <Integer> secondList = new ArrayList<Integer> (); List <Integer> thirdList = new ArrayList<Integer> (); List <Integer> fourthList = new ArrayList<Integer> (); for ( int i = 1; i <= 100000; i++ ) { numList.add(i); firstList.add(i); secondList.add(i); thirdList.add(i); fourthList.add(i); } // first shuffle,use changeTemp getStartTime(); int randInt = 0 ; for ( int i = 0, length = firstList.size(); i < length; i++ ) { randInt = getRandom(i, firstList.size()); changeTemp = firstList.get(i); firstList.set(i, firstList.get(randInt)); firstList.set(randInt, javaShuffle.temp); } getEndTime( "first shuffle run time " ); // second shuffle,exchange list getStartTime(); for ( int i = 0, length = secondList.size(); i < length; i++ ) { randInt = getRandom(i, secondList.size()); secondList.set(i, secondList.set(randInt, secondList.get(i))); } getEndTime( "second shuffle run time" ); // third shuffle, change generate random int getStartTime(); Object[] tempArray = thirdList.toArray(); Random rand = new Random(); int j = 0 ; for ( int i = tempArray.length - 1; i > 0; i-- ) { j = rand.nextInt(i + 1 ); thirdList.set(i, thirdList.set(j, thirdList.get(i))); } getEndTime( "third shuffle run time " ); // fourth shuffle, simulate java shuffle getStartTime(); Random random = new Random(); if (!(fourthList instanceof RandomAccess)) { Object[] array = fourthList.toArray(); for ( int i = array.length - 1; i > 0; i-- ) { int index = random.nextInt(i + 1 ); if (index < 0 ) { index = - index; } Object temp = array[i]; array[i] = array[index]; array[index] = temp; } int i = 0 ; ListIterator <Integer> it = (ListIterator<Integer> ) fourthList.listIterator(); while (it.hasNext()) { it.next(); it.set((Integer) array[i ++ ]); } } else { List <Integer> rawList = (List<Integer> ) fourthList; for ( int i = rawList.size() - 1; i > 0; i-- ) { int index = random.nextInt(i + 1 ); if (index < 0 ) { index = - index; } rawList.set(index, rawList.set(i, rawList.get(index))); } } getEndTime( "fourth shuffle run time" ); // java shuffle getStartTime(); Collections.shuffle(numList); getEndTime( "java shuffle run time " ); } public static void swap( int a, int b) { javaShuffle.temp = a; a = b; b = javaShuffle.temp; } public static int getRandom( final int low, final int high) { return ( int ) (Math.random() * (high - low) + low); } public static void getStartTime() { javaShuffle.start = System.nanoTime(); } public static void getEndTime( final String s) { javaShuffle.end = System.nanoTime(); System.out.println(s + ": " + (javaShuffle.end - javaShuffle.start) + "ns" ); } } 如果数值较小,例如100000级别,则输出大概是: first shuffle run time : 85029499ns second shuffle run time: 80909474ns third shuffle run time : 71543926ns fourth shuffle run time: 76520595ns java shuffle run time : 61027643ns first shuffle run time : 82326239ns second shuffle run time: 78575611ns third shuffle run time : 95009632ns fourth shuffle run time: 105946897ns java shuffle run time : 90849302ns first shuffle run time : 84539840ns second shuffle run time: 85965575ns third shuffle run time : 101814998ns fourth shuffle run time: 113309672ns java shuffle run time : 35089693ns first shuffle run time : 87679863ns second shuffle run time: 79991814ns third shuffle run time : 73720515ns fourth shuffle run time: 78353061ns java shuffle run time : 64146465ns first shuffle run time : 84314386ns second shuffle run time: 80074803ns third shuffle run time : 74001283ns fourth shuffle run time: 79931321ns java shuffle run time : 86427540ns first shuffle run time : 84315523ns second shuffle run time: 81468386ns third shuffle run time : 75052284ns fourth shuffle run time: 79461407ns java shuffle run time : 66607729ns
多次运行结果可能都不一样,但是基本java自带 shuffle速度最快,第三种方式次之。而第一种方法耗时最久。
如果是10000000级别,大概如下:
first shuffle run time : 2115703288ns second shuffle run time: 3114045871ns third shuffle run time : 4664426798ns fourth shuffle run time: 2962686695ns java shuffle run time : 3246883026ns
first shuffle run time : 2165398466ns second shuffle run time: 3129558913ns third shuffle run time : 4147859664ns fourth shuffle run time: 2911849942ns java shuffle run time : 4311703487ns
first shuffle run time : 2227462247ns second shuffle run time: 3279548770ns third shuffle run time : 4704344954ns fourth shuffle run time: 2942635980ns java shuffle run time : 3933172427ns
first shuffle run time : 2200158789ns second shuffle run time: 3172666791ns third shuffle run time : 4715631517ns fourth shuffle run time: 2950817535ns java shuffle run time : 3387417676ns
first shuffle run time : 2201124449ns second shuffle run time: 3203823874ns third shuffle run time : 4179926278ns fourth shuffle run time: 2913690411ns java shuffle run time : 3571313813ns
first shuffle run time : 2163053190ns second shuffle run time: 3073889926ns third shuffle run time : 4493831518ns fourth shuffle run time: 2852713887ns java shuffle run time : 3773602415ns
可以看出,第一种方法速度最快,而第四种最慢。java自带 shuffle速度也不理想。
在进行大数据处理的时候,如果使用java库效率较低时,可以考虑使用其他方式。