Fisher–Yates shuffle 基本思想( Knuth shuffle ):
To shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
JDK源代码如下:
/**
* Moves every element of the List to a random new position in the list.
*
*
@param
list
* the List to shuffle
*
*
@throws
UnsupportedOperationException
* when replacing an element in the List is not supported
*/
public
static
void
shuffle(List<?>
list) {
shuffle(list,
new
Random());
}
/**
* Moves every element of the List to a random new position in the list
* using the specified random number generator.
*
*
@param
list
* the List to shuffle
*
@param
random
* the random number generator
*
*
@throws
UnsupportedOperationException
* when replacing an element in the List is not supported
*/
@SuppressWarnings(
"unchecked"
)
public
static
void
shuffle(List<?>
list, Random random) {
if
(!(list
instanceof
RandomAccess)) {
Object[] array
=
list.toArray();
for
(
int
i = array.length - 1; i > 0; i--
) {
int
index = random.nextInt(i + 1
);
if
(index < 0
) {
index
= -
index;
}
Object temp
=
array[i];
array[i]
=
array[index];
array[index]
=
temp;
}
int
i = 0
;
ListIterator
<Object> it = (ListIterator<Object>
) list
.listIterator();
while
(it.hasNext()) {
it.next();
it.set(array[i
++
]);
}
}
else
{
List
<Object> rawList = (List<Object>
) list;
for
(
int
i = rawList.size() - 1; i > 0; i--
) {
int
index = random.nextInt(i + 1
);
if
(index < 0
) {
index
= -
index;
}
rawList.set(index, rawList.set(i, rawList.get(index)));
}
}
}
测试代码,为了确保每种情况的初始化一样,使用了多个容器:
public
class
javaShuffle {
public
static
int
temp = 0
;
public
static
long
start;
public
static
long
end;
public
static
void
main(
final
String args[]) {
Object changeTemp;
List
<Integer> numList =
new
ArrayList<Integer>
();
List
<Integer> firstList =
new
ArrayList<Integer>
();
List
<Integer> secondList =
new
ArrayList<Integer>
();
List
<Integer> thirdList =
new
ArrayList<Integer>
();
List
<Integer> fourthList =
new
ArrayList<Integer>
();
for
(
int
i = 1; i <= 100000; i++
) {
numList.add(i);
firstList.add(i);
secondList.add(i);
thirdList.add(i);
fourthList.add(i);
}
//
first shuffle,use changeTemp
getStartTime();
int
randInt = 0
;
for
(
int
i = 0, length = firstList.size(); i < length; i++
) {
randInt
=
getRandom(i, firstList.size());
changeTemp
=
firstList.get(i);
firstList.set(i, firstList.get(randInt));
firstList.set(randInt, javaShuffle.temp);
}
getEndTime(
"first shuffle run time "
);
//
second shuffle,exchange list
getStartTime();
for
(
int
i = 0, length = secondList.size(); i < length; i++
) {
randInt
=
getRandom(i, secondList.size());
secondList.set(i, secondList.set(randInt, secondList.get(i)));
}
getEndTime(
"second shuffle run time"
);
//
third shuffle, change generate random int
getStartTime();
Object[] tempArray
=
thirdList.toArray();
Random rand
=
new
Random();
int
j = 0
;
for
(
int
i = tempArray.length - 1; i > 0; i--
) {
j
= rand.nextInt(i + 1
);
thirdList.set(i, thirdList.set(j, thirdList.get(i)));
}
getEndTime(
"third shuffle run time "
);
//
fourth shuffle, simulate java shuffle
getStartTime();
Random random
=
new
Random();
if
(!(fourthList
instanceof
RandomAccess)) {
Object[] array
=
fourthList.toArray();
for
(
int
i = array.length - 1; i > 0; i--
) {
int
index = random.nextInt(i + 1
);
if
(index < 0
) {
index
= -
index;
}
Object temp
=
array[i];
array[i]
=
array[index];
array[index]
=
temp;
}
int
i = 0
;
ListIterator
<Integer> it = (ListIterator<Integer>
) fourthList.listIterator();
while
(it.hasNext()) {
it.next();
it.set((Integer) array[i
++
]);
}
}
else
{
List
<Integer> rawList = (List<Integer>
) fourthList;
for
(
int
i = rawList.size() - 1; i > 0; i--
) {
int
index = random.nextInt(i + 1
);
if
(index < 0
) {
index
= -
index;
}
rawList.set(index, rawList.set(i, rawList.get(index)));
}
}
getEndTime(
"fourth shuffle run time"
);
//
java shuffle
getStartTime();
Collections.shuffle(numList);
getEndTime(
"java shuffle run time "
);
}
public
static
void
swap(
int
a,
int
b) {
javaShuffle.temp
=
a;
a
=
b;
b
=
javaShuffle.temp;
}
public
static
int
getRandom(
final
int
low,
final
int
high) {
return
(
int
) (Math.random() * (high - low) +
low);
}
public
static
void
getStartTime() {
javaShuffle.start
=
System.nanoTime();
}
public
static
void
getEndTime(
final
String s) {
javaShuffle.end
=
System.nanoTime();
System.out.println(s
+ ": " + (javaShuffle.end - javaShuffle.start) + "ns"
);
}
}
如果数值较小,例如100000级别,则输出大概是:
first shuffle run time : 85029499ns
second shuffle run time: 80909474ns
third shuffle run time : 71543926ns
fourth shuffle run time: 76520595ns
java shuffle run time : 61027643ns
first shuffle run time : 82326239ns
second shuffle run time: 78575611ns
third shuffle run time : 95009632ns
fourth shuffle run time: 105946897ns
java shuffle run time : 90849302ns
first shuffle run time : 84539840ns
second shuffle run time: 85965575ns
third shuffle run time : 101814998ns
fourth shuffle run time: 113309672ns
java shuffle run time : 35089693ns
first shuffle run time : 87679863ns
second shuffle run time: 79991814ns
third shuffle run time : 73720515ns
fourth shuffle run time: 78353061ns
java shuffle run time : 64146465ns
first shuffle run time : 84314386ns
second shuffle run time: 80074803ns
third shuffle run time : 74001283ns
fourth shuffle run time: 79931321ns
java shuffle run time : 86427540ns
first shuffle run time : 84315523ns
second shuffle run time: 81468386ns
third shuffle run time : 75052284ns
fourth shuffle run time: 79461407ns
java shuffle run time : 66607729ns
多次运行结果可能都不一样,但是基本java自带 shuffle速度最快,第三种方式次之。而第一种方法耗时最久。
如果是10000000级别,大概如下:
first shuffle run time : 2115703288ns
second shuffle run time: 3114045871ns
third shuffle run time : 4664426798ns
fourth shuffle run time: 2962686695ns
java shuffle run time : 3246883026ns
first shuffle run time : 2165398466ns
second shuffle run time: 3129558913ns
third shuffle run time : 4147859664ns
fourth shuffle run time: 2911849942ns
java shuffle run time : 4311703487ns
first shuffle run time : 2227462247ns
second shuffle run time: 3279548770ns
third shuffle run time : 4704344954ns
fourth shuffle run time: 2942635980ns
java shuffle run time : 3933172427ns
first shuffle run time : 2200158789ns
second shuffle run time: 3172666791ns
third shuffle run time : 4715631517ns
fourth shuffle run time: 2950817535ns
java shuffle run time : 3387417676ns
first shuffle run time : 2201124449ns
second shuffle run time: 3203823874ns
third shuffle run time : 4179926278ns
fourth shuffle run time: 2913690411ns
java shuffle run time : 3571313813ns
first shuffle run time : 2163053190ns
second shuffle run time: 3073889926ns
third shuffle run time : 4493831518ns
fourth shuffle run time: 2852713887ns
java shuffle run time : 3773602415ns
可以看出,第一种方法速度最快,而第四种最慢。java自带 shuffle速度也不理想。
在进行大数据处理的时候,如果使用java库效率较低时,可以考虑使用其他方式。

