Hadoop備忘:Reduce階段Iterablevalues中的每個值都共享一個對象
/**?
*?Iterate?through?the?values?for?the?current?key,?reusing?the?same?value??
*?object,?which?is?stored?in?the?context.?
*?@return?the?series?of?values?associated?with?the?current?key.?All?of?the??
*?objects?returned?directly?and?indirectly?from?this?method?are?reused.?
*/??
public???
Iterable?getValues()?throws?IOException,?InterruptedException?{??
return?iterable;??
} ?
在Reduce階段,具有相同key的的所有的value都會被組織到一起,形成一種key:values的形式。
一般情況下,我們會針對某個key下的所有的values進(jìn)行處理,這里需要注意一個問題,當(dāng)我們寫下如下代碼的時候:
protected?void?reduce(KEYIN?key,?Iterable?values,?Context?context??
)?throws?IOException,?InterruptedException?{??
for(VALUEIN?value:?values)?{??
context.write((KEYOUT)?key,?(VALUEOUT)?value);??
}??
} ?
我們在一個循環(huán)中,每次得到的value實際上都是指向的同一個對象,只是在每次迭代的時候,將新的值反序列化到這個對象中,以更新此對象的值:
/**?
*?Advance?to?the?next?key/value?pair.?
*/??
@Override??
public?boolean?nextKeyValue()?throws?IOException,?InterruptedException?{??
if?(!hasMore)?{??
key?=?null;??
value?=?null;??
return?false;??
}??
firstValue?=?!nextKeyIsSame;??
DataInputBuffer?nextKey?=?input.getKey();??
currentRawKey.set(nextKey.getData(),?nextKey.getPosition(),???
nextKey.getLength()?-?nextKey.getPosition());??
buffer.reset(currentRawKey.getBytes(),?0,?currentRawKey.getLength());??
key?=?keyDeserializer.deserialize(key);??
DataInputBuffer?nextVal?=?input.getValue();??
buffer.reset(nextVal.getData(),?nextVal.getPosition(),??
nextVal.getLength()?-?nextVal.getPosition());??
value?=?valueDeserializer.deserialize(value); ?
currentKeyLength?=?nextKey.getLength()?-?nextKey.getPosition();??
currentValueLength?=?nextVal.getLength()?-?nextVal.getPosition(); ?
hasMore?=?input.next();??
if?(hasMore)?{??
nextKey?=?input.getKey();??
nextKeyIsSame?=?comparator.compare(currentRawKey.getBytes(),?0,???
評論