青春草原网站在线视频,狼友社区网站入口,动漫美女裸体露双奶头动态图

大家好！對于C++開發人員來說，string大概是使用最多的標準庫數據結構之一，一直以來也就僅限于使用，對于底層實現似懂非懂。所以，最近抽出點時間，大致研究了下string的底層實現。今天，就從內存分配的角度來分析下string的實現機制。

直接分配

大概在08年的時候，手動實現過string，沒有考慮性能，所以單純是從功能的角度進行實現，下面摘抄了部分代碼，如下：

string::string(constchar*s){
size_=strlen(s);
buffer_=newchar[size_+1];
strcpy(buffer_,s);
}

string&string::string(conststring&str){
size_+=str.size_;
char*data=newchar[size_+1];
strcpy(data,buffer_);
strcat(data,str.buffer_);

delete[]buffer_;
buffer_=data;
return*this;
}

上述代碼為string的部分成員函數，從上述實現可以看出，無論是構造還是拷貝，都是重新在堆上(使用new關鍵字)分配一塊內存。這樣做的優點是實現簡單，而缺點呢，因為每次都在堆上進行分配，而堆上內存的分配效率非常差(當然是相對棧來說的)，所以有沒有更好的實現方式呢？下面我們看先STL中的基本實現。

SSO

記得之前在看Redis源碼的時候，對整數集合(intset)有個優化:根據新元素的類型，擴展整數集合底層數組的空間大小，并未新元素分配空間，也就是說，假設在初始的時候，集合中最大的數為3，那么這個時候集合的類型為INT_16，如果此時新增一個元素為65536，那么就將集合的類型更改為INT_32，并重新為集合分配空間，將之前的數據進行類型擴展。

那么string有沒有類似Redis整數集合的功能，進行類型升級呢？

帶著這個疑問，研究了string源碼，發現里面使用了一個名為SSO的優化策略~~~

SSO為Small String Optimization的簡寫，中文譯為小字符串優化，基本原理是：當分配大小小于16個字節時候，從棧上進行分配，而如果大于等于16個字節，則在堆上進行內存分配。PS：需要注意的是，此優化自GCC5.1生效，也就是說對于GCC版本小于5的，無論長度為多少，都從堆上進行分配。

為了證實上述結論，測試代碼如下：

#include
#include
#include

void*operatornew(std::size_tn){
std::cout<

	

	在上述代碼中，我們重載了operator new，以替換string中的new實現，這樣做的好處是，可以通過輸出來發現是否調用了new進行動態分配。

	G++ 4.9.4版本輸出如下：

	
0:
[Allocating26bytes]1:=
[Allocating27bytes]2:==
[Allocating28bytes]3:===
[Allocating29bytes]4:====
[Allocating30bytes]5:=====
[Allocating31bytes]6:======
[Allocating32bytes]7:=======
[Allocating33bytes]8:========
[Allocating34bytes]9:=========
[Allocating35bytes]10:==========
[Allocating36bytes]11:===========
[Allocating37bytes]12:============
[Allocating38bytes]13:=============
[Allocating39bytes]14:==============
[Allocating40bytes]15:===============
[Allocating41bytes]16:================
[Allocating42bytes]17:=================
[Allocating43bytes]18:==================
[Allocating44bytes]19:===================
[Allocating45bytes]20:====================
[Allocating46bytes]21:=====================
[Allocating47bytes]22:======================
[Allocating48bytes]23:=======================

	

	GCC5.1 輸出如下：

	
0:
1:=
2:==
3:===
4:====
5:=====
6:======
7:=======
8:========
9:=========
10:==========
11:===========
12:============
13:=============
14:==============
15:===============
16:[Allocating17bytes]================
17:[Allocating18bytes]=================
18:[Allocating19bytes]==================
19:[Allocating20bytes]===================
20:[Allocating21bytes]====================
21:[Allocating22bytes]=====================
22:[Allocating23bytes]======================
23:[Allocating24bytes]=======================

	

	從GCC5.1的輸出內容可以看出，當字符串長度小于16的時候，沒有調用我們的operator new函數，這就從側面證明了前面的結論當分配大小小于16個字節時候，從棧上進行分配，而如果大于等于16個字節，則在堆上進行內存分配。(PS:GCC4.9.4版本的輸出，分配字節數大于實際的字節數，這個是string的又一個優化策略，即預分配策略，在后面的內容中將會講到)。

	直奔主題

	不妨閉上眼睛，仔細想下，如果讓我們自己來實現該功能，你會怎么做？

	可能大部分人的思路是：定義一個固定長度的char數組，在進行構造的時候，判斷字符串的長度，如果長度小于某個定值，則使用該數組，否則在堆上進行分配~~~

	好了，為了驗證上述思路與具體實現是否一致，結合源碼一起來分析~~

	首先，摘抄了部分string的源碼，如下：string源碼

	
template
classbasic_string
{
private:
//Useempty-baseoptimization:http://www.cantrip.org/emptyopt.html
struct_Alloc_hider:allocator_type//TODOcheck__is_final
{
_Alloc_hider(pointer__dat,const_Alloc&__a=_Alloc())
:allocator_type(__a),_M_p(__dat){}

pointer_M_p;//Theactualdata.
};

_Alloc_hider_M_dataplus;
size_type_M_string_length;

enum{_S_local_capacity=15/sizeof(_CharT)};

union
{
_CharT_M_local_buf[_S_local_capacity+1];
size_type_M_allocated_capacity;
};
};

	

	上面抽出了我們需要關注的部分代碼，只需要關注以下幾個點：

	?_M_string_length已分配字節數

	?_M_dataplus實際數據存放的位置

	? union字段：兩個字段中較大的一個_M_local_buf為 16 字節

	?_M_local_buf這是一個用以實現SSO功能的字段，大小為16（15 + 1其中1為結束符）個字節

	?_M_allocated_capacity是一種size_t類型，功能類似于vector中的預分配，其與_M_local_buf不能共存

	從上述源碼中，我們看到有個變量_M_local_buf，從字面意思看就是一個本地或者局部buffer，猜測是用來存儲大小不足16字節的內容，為了證實我們的猜測，下面結合GDB一起再分析下SSO的實現機制，示例代碼如下：

	
#include

intmain(){
std::stringstr("hello");
return0;
}

	

	gdb調試代碼如下：

	
(gdb)s
Singlesteppinguntilexitfromfunctionmain,
whichhasnolinenumberinformation.
std::basic_string,std::allocator>::basic_string(charconst*,std::allocatorconst&)()
at/root/gcc-5.4.0/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:454
454basic_string(const_CharT*__s,const_Alloc&__a=_Alloc())
(gdb)s
141returnstd::pointer_traits::pointer_to(*_M_local_buf);
(gdb)n
454basic_string(const_CharT*__s,const_Alloc&__a=_Alloc())
(gdb)
456{_M_construct(__s,__s?__s+traits_type::length(__s):__s+npos);}
(gdb)
141returnstd::pointer_traits::pointer_to(*_M_local_buf);
(gdb)
456{_M_construct(__s,__s?__s+traits_type::length(__s):__s+npos);}
(gdb)
267{return__builtin_strlen(__s);}
(gdb)
456{_M_construct(__s,__s?__s+traits_type::length(__s):__s+npos);}
(gdb)
195_M_construct(__beg,__end,_Tag());
(gdb)
456{_M_construct(__s,__s?__s+traits_type::length(__s):__s+npos);}

	

	單從上述信息不能很明確的了解整個構造過程，我們留意到構造的過程在basic_string.h:454，所以就通過源碼進行分析，如下：

	
basic_string(const_CharT*__s,const_Alloc&__a=_Alloc())
:_M_dataplus(_M_local_data(),__a)
{_M_construct(__s,__s?__s+traits_type::length(__s):__s+npos);}

	

	_M_construct從函數字面看出是用來構造該對象，在后面進行分析，下面先分析下M_dataplus函數實現，

	
_M_local_data()const
{
#if__cplusplus>=201103L
returnstd::pointer_traits::pointer_to(*_M_local_buf);
#else
returnconst_pointer(_M_local_buf);
#endif
}

	

	在前面內容中，提到過_M_dataplus用來指向實際存儲數據的地址，在basic_string()函數的構造中，首先將__M_dataplus指向local_buf，然后調用__M_construct進行實際構造，而M_construct最終會調用如下代碼：

	
template
template
void
basic_string<_CharT,?_Traits,?_Alloc>::
_M_construct(_InIterator__beg,_InIterator__end,
std::forward_iterator_tag)
{
//NB:Notrequired,butconsideredbestpractice.
if(__gnu_cxx::__is_null_pointer(__beg)&&__beg!=__end)
std::__throw_logic_error(__N("basic_string::"
"_M_constructnullnotvalid"));

size_type__dnew=static_cast(std::distance(__beg,__end));

if(__dnew>size_type(_S_local_capacity))
{
_M_data(_M_create(__dnew,size_type(0)));
_M_capacity(__dnew);
}

//Checkforout_of_rangeandlength_errorexceptions.
__try
{this->_S_copy_chars(_M_data(),__beg,__end);}
__catch(...)
{
_M_dispose();
__throw_exception_again;
}

_M_set_length(__dnew);
}

	

	在上述代碼中，首先計算當前字符串的實際長度，如果長度大于_S_local_capacity即15，那么則通過_M_create在堆上創建一塊內存，最后通過_S_copy_chars函數進行內容拷貝。

	結語

	本文中的測試環境基于Centos6.8 & GCC5.4，也就是說在本環境中，string中如果實際數據小于16個字節，則在本地局部存儲，而大于15字節，則存儲在堆上，這也就是string的一個優化特性SSO(Small String Optimization)。在查閱了相關資料，發現15字節的限制取決于編譯器和操作系統，在fedora和red-hat中，字符串總是存儲在堆中（來自于網絡，由于手邊缺少相關環境，所以未能驗證，抱歉）。

	好了，今天的文章就到這，我們下期見！

	


	審核編輯：劉清