女人自慰AV免费观看内涵网,日韩国产剧情在线观看网址,神马电影网特片网,最新一级电影欧美,在线观看亚洲欧美日韩,黄色视频在线播放免费观看,ABO涨奶期羡澄,第一导航fulione,美女主播操b

0
  • 聊天消息
  • 系統消息
  • 評論與回復
登錄后你可以
  • 下載海量資料
  • 學習在線課程
  • 觀看技術視頻
  • 寫文章/發帖/加入社區
會員中心
創作中心

完善資料讓更多小伙伴認識你,還能領取20積分哦,立即完善>

3天內不再提示

通俗理解文本生成的常用解碼策略

深度學習自然語言處理 ? 來源:看個通俗理解吧 ? 2023-03-13 10:45 ? 次閱讀

通俗理解文本生成的常用解碼策略

General Understanding of Decoding Strategies Commonly Used in Text Generation

注意:這一篇文章只包含了常用的解碼策略,未包含更新的研究成果。Note: this post contains only commonly used decoding strategies and does not include more recent research findings.

目錄:

  1. 背景簡介
  2. 解決的問題
  3. 解碼策略
  • Standard Greedy Search
  • Beam Search
  • Sampling
  • Top-k Sampling
  • Sampling with Temperature
  • Top-p (Nucleus) Sampling
  1. 代碼快覽
  2. 總結

This post covers:

  1. Background
  2. Problem
  3. Decoding Strategies
  • Standard Greedy Search
  • Beam Search
  • Sampling
  • Top-k Sampling
  • Sampling with Temperature
  • Top-p (Nucleus) Sampling
  1. Code Tips
  2. Summary

1. 背景簡介(Background)

“Autoregressive”語言模型的含義是:當生成文本時,它不是一下子同時生成一段文字(模型吐出來好幾個字),而是一個字一個字的去生成。"Autoregressive" means that when a model generates text, it does not generate all the words in the text at once, but word by word.

舉例來說,圖中(For example as shown in the figure):

1)小伙問了模型一個問題 (he asked the modal a question):Hello! How are you today?b9f274f0-c142-11ed-bfe3-dac502259ad0.png

2)模型需要生成回復文本,它生成的第一個單詞是(the first word in the response generated by the model is): “I”ba035a04-c142-11ed-bfe3-dac502259ad0.png

3)當生成“I”之后,模型根據目前得到的信息(小伙的問題 + “I”),繼續生成下一個單詞 (Once the "I" has been generated, the model continues to generate the next word based on the information, the question + "I", it has received so far): “am”ba17c782-c142-11ed-bfe3-dac502259ad0.png

ba3515d0-c142-11ed-bfe3-dac502259ad0.png

4)當生成“I am”之后,模型根據目前得到的信息(小伙的問題 + “I am”),繼續生成下一個單詞(Once "I am" is generated, the model continues to generate the next word based on the information, the question + "I am", it has received so far): “good”ba506a4c-c142-11ed-bfe3-dac502259ad0.png

5)重復上面的步驟,直到生成一種特殊的詞之后,表示這一次的文本生成可以停止了。在例子中,當生成“[EOS]”時表示這段文本生成完畢,(EOS = End of Sentence)。Repeat the above steps until a particular word is generated, which means that the text generation can be stopped for this time. In the example, the text is complete when the word "[EOS]" is generated.ba634bc6-c142-11ed-bfe3-dac502259ad0.png

2. 解決的問題(Problem)

由于這種模型生成文本的方式是一個詞一個詞的輸出,所以它是否能夠產生的文本取決于我們是否能夠聰明地決定下一步應該輸出哪一個詞匯。Since this way of generating text is output word by word, its ability to produce goodtext depends on the text generation strategy being smart enough to decide which word should be output at each step.

需要注意的是,在這篇文章中,對“好”的定義并不是指的這個模型在經過良好的訓練之后,具備了接近人類并且高質量的表達能力。這里對“好”的定義是一個好的挑選輸出詞的策略。詳細來說就是,在模型預測下一個詞應該是什么的時候,它在任何狀態下(也就是說不管模型是否經過了良好的訓練),這個策略總是有一套自己的辦法去盡全力挑選出來一個它認為最合理的詞作為輸出。It is important to note that the definition of 'good' in this post does not mean that the model is well-trained and has a high quality of expression close to that of a human. In this context, the definition of 'good' is a good strategy for selecting output words. In detail, this means that when the model is predicting what the next word should be, in any status (i.e. whether the model is well-trained or not), the strategy always have a way of doing its best to pick the word that the startegy think makes the most sense as an output.

當選詞策略將要對下一個輸出詞做選擇的時候,在我們面前有一張巨大的表格。這張表格便是模型目前認為下一步應該輸出哪一個詞的概率。The word selection strategy refers to a large table when making decisions about what words to output. This table stores the probability of what the model currently thinks the next word should be.

ba841784-c142-11ed-bfe3-dac502259ad0.png

3. 解碼策略(Decoding Strategies)

3.1 Standard Greedy Search

ba954d24-c142-11ed-bfe3-dac502259ad0.png最簡單樸素的方法是我們可以總是可以去挑選概率最高的詞去輸出。但是這樣的方法會有一個潛在的問題,當整個句子(多個詞匯)輸出完畢的時候,我們不能保證這整個句子就是最好的。我們也有可能找到比它更好的句子。雖然我們在每一步輸出的時候,選擇了我們目前認為最好的選擇,但從長遠全局來看,這并不代表這些詞組合出來的整個句子就是最好的。The simplest and most straightforward way is that we can always pick the word with the highest probability. But there will be a potential problem, when we get the whole sentence, we can't guarantee that the sentence is good. It is also possible that we will find a sentence that is better than it is. Although we choose what we think are the best words at each step, in the big picture this does not mean that the whole sentence resulting from the combination of these words is good.babd5f6c-c142-11ed-bfe3-dac502259ad0.png

3.2 Beam Search

為了解決“大局觀”這個問題,我們可以嘗試Beam搜索這個策略。舉個例子,如圖所示(To solve the "big picture" problem, we can try the Beam search strategy. An example is shown in the figure):badc7faa-c142-11ed-bfe3-dac502259ad0.png在3.1的策略中,我們的目光比較狹窄,因為我們只會關注我們認為最好的那1個輸出。而在Beam搜索策略中,我們可以關注更多的“選手”(在圖中我們關注了2個)。前期表現好的選手到最后不一定是最好的。如果你愿意的話,也可以同時關注更多的“選手”,只不過這樣付出的代價就是你需要的運算資源更多了。In the strategy mentioned in section 3.1, we have a narrow focus, as we only focus on the 1 prediction we think is the best. Whereas in the Beam search strategy, we can focus on more options (in the figure we focus on 2). The text fragments with the highest scores in the early stages are not necessarily the best in the end. You can also focus on more text fragments at the same time if you wish, but this requires more computing resources.

1)在上圖中,當前的輸入為“The” → Beam搜索策略需要根據輸出的概率表格選擇下一個輸出詞 → Beam選擇關注最好的2位“選手”,即"The dog"和"The nice",他們的得分分別為0.4與0.5。In the above figure, the current input is "The" → Beam search strategy selects the next word based on the output probability table → Beam focuses on the best 2 choices, "The dog" and "The nice", which have scores of 0.4 and 0.5 respectively.

baed6f86-c142-11ed-bfe3-dac502259ad0.png

2)現在,在上圖中,當前的輸入為“The dog”和“The nice” → 當前的策略為這2個輸入挑選出分數最高的輸出→“The dog has”和“The nice woman”。Now, in the above figure, the current inputs are "The dog" and "The nice" → the current strategy picks the highest scoring output for these 2 inputs → "The dog has" and "The nice woman".

3)繼續按照上面的思路,一直執行到最后你會得到2個得分最高的句子。Keep executing until the end and you will get the 2 highest-scoring sentences.

bb0fc162-c142-11ed-bfe3-dac502259ad0.png

缺陷之一 (Shortcomings 1):但是這種策略生成的文本可能會有一個缺陷,容易重復的說一些話(如下圖所示:“I'm not sure if I'll...”出現了2次)。一種補救辦法是用簡單的規則來限制生成的文本,比如同一小段文本(n-gram)不能出現2次。However, the text generated by this strategy may have the drawback of being prone to saying something over and over again (as shown below: "I'm not sure if I'll..." which appears 2 times). One remedy is to restrict the generated text with simple rules, such as the same text fragment (n-gram) not appearing twice.bb4d0f22-c142-11ed-bfe3-dac502259ad0.png

缺陷之二 (Shortcomings 2):當我們想要的很大時(的含義是我們想要同時觀察個分數最高的生成結果),相對應地對運算資源的需求也會變大。When we want to be large (in the sense that we want to observe thehighest scoring generated results simultaneously), the corresponding demand on computing resources becomes large.bb6e4fde-c142-11ed-bfe3-dac502259ad0.png

缺陷之三 (Shortcomings 3):模型生成的文本比較枯燥、無趣。經過研究表明,在Beam搜索策略的引導下,模型雖然可以生成人類能夠理解的句子,但是這些句子并沒有給真正的人類帶來驚喜。The text generated by the model is rather boring and uninteresting. It has been shown that guided by the Beam search strategy, the model can generate sentences that humans can understand, but these sentences do not surprise real humans.bba6da48-c142-11ed-bfe3-dac502259ad0.png

Beam搜索策略的變體 (More Variants of the Beam Search Strategy):bbdef630-c142-11ed-bfe3-dac502259ad0.png

3.3 Sampling

使用采樣的方法可以讓生成的文本更加多樣化。很樸素的一種方法就是按照當前的概率分布來進行采樣。在模型的認知中,它認為合理的詞(也就是概率大的詞)就會有更大的幾率被采樣到。這種方法的缺陷是會有一定幾率亂說話,或者生成的句子并不像人類話那般流利。The use of sampling allows for a greater variety of text to be generated. A very simple way of doing this is to sample according to the current probability distribution. Words with a high probability will have a higher chance of being sampled. The disadvantageof this approach is that there is a certain chance that incoherent words will be generated, or that the sentences generated will not be as fluent as in the language used by humans.bbf17062-c142-11ed-bfe3-dac502259ad0.png

3.4 Top-k Sampling

為了緩解上述問題,我們可以限制采樣的范圍。例如我們可以每次只在概率表中的排名前個詞中采樣。To alleviate the above problem, we can limit the scope of sampling. For example, we could sample only the top words in the probability table at a time.bc471314-c142-11ed-bfe3-dac502259ad0.png

3.5 Sampling with Temperature

這種方法可以對當前的概率分布進行縮放,例如讓概率大的更大、讓小的變的更小,或者讓大概率和小概率之間差別沒那么明顯等。而控制這種縮放力度的參數為[0,1)。在公式中,。This method allows the current probability distribution to be rescaled, for example by making larger probabilities larger, making smaller ones smaller, or making the difference between large and small probabilities less significant, etc. The parameter that controls the strength of this scaling is [0,1). In the equation,.

  • 當變大時,模型在生成文本時更傾向于比較少見的詞匯。越大,重新縮放后的分布就越接近均勻采樣。As becomes larger, the model favours less common words when generating text. The largeris, the closer the rescaled distribution is to uniform sampling.
  • 當變小時,模型在生成文本時更傾向于常見的詞。越大,重新縮放后的分布就越接近我們最開始提到的貪婪生成方法(即總是去選擇概率最高的那個詞)。When becomes small, the model tends to favour common words when generating text. The largeris, the closer the rescaled distribution is to the greedy search strategy we mentioned at the beginning (i.e. always going for the word with the highest probability).bc625df4-c142-11ed-bfe3-dac502259ad0.png

在Meena相關的論文中是這樣使用這種策略的(This strategy is used in the relevant Meena paper in the following way):

  • 針對同一段輸入,論文讓模型使用這種策略生成20個不同的文本回復。For the same input, the paper has the model generate 20 different text responses using this strategy.
  • 然后從這20個句子中,挑選出整個句子概率最大的作為最終輸出。Then from these 20 sentences, the one with the highest probability for the whole sentence is selected as the final output.bc85ff98-c142-11ed-bfe3-dac502259ad0.png使用上述方法生成的句子明顯比使用Beam搜索策略生成的句子更加多樣化、高質量。The sentences generated using the above method are significantly more diverse and of higher quality than those generated using the Beam search strategy.bcd714b4-c142-11ed-bfe3-dac502259ad0.png

3.6 Top-p (Nucleus) Sampling

在top-k采樣的方法中,我們把可采樣的范圍的限制非常的嚴格。例如“Top-5”表示我們只能在排名前5位的詞進行采樣。這樣其實會有潛在的問題 (In the top-k sampling method, we strictly limit the range of words that can be sampled. For example, "Top-5" means that we can only sample the top 5 words in the rankings. This has the potential to be problematic):

  • 排在第5位之后的詞也有可能是概率值并不算小的詞,但是我們把這些詞永遠的錯過了,而這些詞很有可能也是非常不錯的選擇。Words after the 5th position are also likely to be words with high probability values, but we miss these words forever when they are likely to be good choices as well.
  • 排在5位之內的詞也有可能是概率值并不高的詞,但是我們也把他們考慮進來了,而這些詞很有可能會降低文本質量。Words ranked within the top 5 may also be words that do not have a high probability value, but we have taken them into account and they are likely to reduce the quality of the text.bd014626-c142-11ed-bfe3-dac502259ad0.png

在Top-p這種方法中,我們通過設置一個閾值()來達到讓取詞的范圍可以動態地自動調整的效果:我們把排序后的詞表從概率值最高的開始算起,一直往后累加,一直到我們累加的概率總值超過閾值為止。在閾值內的所有詞便是采樣取詞范圍。In the Top-p method, we make the range of words taken dynamically adjustable by setting a threshold (): we add the probabilities in the word list starting with the highest probability value and keep adding them up until the total value of the probabilities we have accumulated exceeds the threshold. All words within the threshold are in the sampling range.

假設,我們設置的閾值為0.92(Suppose, we set a threshold of0.92):

  • 在左圖中,前9個詞的概率加起來才超過了0.92。In the left figure, the probabilities of the first 9 words add up to more than 0.92.
  • 在右圖中,前3個詞的概率和就可以超過0.92。In the right figure, the probabilities of the first 3 words can sum to more than 0.92.bd15a8c8-c142-11ed-bfe3-dac502259ad0.png

4 代碼快覽(Code Tips)

圖中展示了部分Huggingface接口示例。我們可以看的出來,盡管在這篇文章中提到了不同的方法,但是它們之間并不是完全孤立的。有些方法是可以混合使用的,這也是為什么我們可以在代碼中可以同時設置多個參數。The figure shows a partial example of the Huggingface interface. As we can see, despite the different methods mentioned in this post, they are not completely separated from each other. Some of the methods are mixable, which is why we can have more than one argument in the code at the same time.bd4428ec-c142-11ed-bfe3-dac502259ad0.png

5 總結 (Summary)

文章中介紹了一些常用的解碼策略,但是其實很難評價到底哪一個是最好的。Some common decoding strategies are described in the post, and it is actually difficult to evaluate which one is actually the best.bdb415a8-c142-11ed-bfe3-dac502259ad0.png

一般來講,在開放領域的對話系統中,基于采樣的方法是好于貪婪和Beam搜索策略的。因為這樣的方法生成的文本質量更高、多樣性更強。In general, sampling-based approaches are preferable to greedy and Beam search strategies in open-domain dialogue systems. This is because such an approach generates higher quality and more diverse text.bdbf3faa-c142-11ed-bfe3-dac502259ad0.png

但這并不意味著我們徹底放棄了貪婪和Beam搜索策略,因為有研究證明,經過良好的訓練,這兩種方法是可以生成比Top-p采樣策略更好的文本。However, this does not mean that we completely abandon the greedy and Beam search strategies, as it has been shown that, with proper training, these two methods are capable of generating better text than the Top-p sampling strategy.bde574cc-c142-11ed-bfe3-dac502259ad0.png

自然語言處理之路還有很長很長,繼續加油吧~ There is still a long, long way to go in natural language processing research. Keep working hard!

小提醒 (Note):使用原創文章內容前請先閱讀說明(菜單→所有文章)Please read the instructions before using any original post content (Menu → All posts) or contact me if you have any questions.



審核編輯 :李倩


聲明:本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人,不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用,如有內容侵權或者其他違規問題,請聯系本站處理。 舉報投訴
  • 解碼
    +關注

    關注

    0

    文章

    185

    瀏覽量

    27751
  • 模型
    +關注

    關注

    1

    文章

    3483

    瀏覽量

    49987
  • 文本
    +關注

    關注

    0

    文章

    119

    瀏覽量

    17362

原文標題:通俗理解文本生成的常用解碼策略

文章出處:【微信號:zenRRan,微信公眾號:深度學習自然語言處理】歡迎添加關注!文章轉載請注明出處。

收藏 人收藏

    評論

    相關推薦
    熱點推薦

    高級檢索增強生成技術(RAG)全面指南

    ChatGPT、Midjourney等生成式人工智能(GenAI)在文本生成文本到圖像生成等任務中表現出令人印象深刻的性能。
    的頭像 發表于 12-25 15:16 ?5979次閱讀
    高級檢索增強<b class='flag-5'>生成</b>技術(RAG)全面指南

    請問怎么通俗理解載波?

    如何通俗理解載波?單載波和多載波,誰能講的明白給最佳~~
    發表于 05-10 07:56

    如何構建文本生成器?如何實現馬爾可夫鏈以實現更快的預測模型

    文本生成器簡介文本生成在各個行業都很受歡迎,特別是在移動、應用和數據科學領域。甚至新聞界也使用文本生成來輔助寫作過程。在日常生活中都會接觸到一些文本生成技術,
    發表于 11-22 15:06

    循環神經網絡卷積神經網絡注意力文本生成變換器編碼器序列表征

    序列表征循環神經網絡卷積神經網絡注意力文本生成變換器編碼器自注意力解碼器自注意力殘差的重要性圖像生成概率圖像生成結合注意力和局部性音樂變換器音樂的原始表征音樂的語言模型音樂
    的頭像 發表于 07-19 14:40 ?3471次閱讀
    循環神經網絡卷積神經網絡注意力<b class='flag-5'>文本生成</b>變換器編碼器序列表征

    基于生成對抗網絡GAN模型的陸空通話文本生成系統設計

    可以及時發現飛行員錯誤的復誦內容。考慮到訓練一個有效的差錯校驗網絡模型需要大量的文本數據,本文提出一種基于生成對抗網絡GAN的陸空通話文本生成方法。首先對現有真實的陸空通話文本進行篩選
    發表于 03-26 09:22 ?34次下載
    基于<b class='flag-5'>生成</b>對抗網絡GAN模型的陸空通話<b class='flag-5'>文本生成</b>系統設計

    基于生成器的圖像分類對抗樣本生成模型

    現有基于生成器的對抗樣本生成模型相比基于迭代修改原圖的算法可有效降低對抗樣本的構造時間,但其生成的對抗樣本與原圖在感知上具有明顯差異,人眼易察覺。該模型旨在增加對抗樣本與原圖在人眼觀察感知上的相似性
    發表于 04-07 14:56 ?2次下載
    基于<b class='flag-5'>生成</b>器的圖像分類對抗樣<b class='flag-5'>本生成</b>模型

    基于生成式對抗網絡的深度文本生成模型

    評論,對音樂作品自動生成評論可以在一定程度上解決此問題。在在線唱歌平臺上的評論文本與音樂作品的表現評級存在一定的關系。因此,研究考慮音樂作品評級信息的評論文本自動生成的方為此提出了一種
    發表于 04-12 13:47 ?15次下載
    基于<b class='flag-5'>生成</b>式對抗網絡的深度<b class='flag-5'>文本生成</b>模型

    文本生成任務中引入編輯方法的文本生成

    4. FELIX FELIX是Google Research在“FELIX: Flexible Text Editing Through Tagging and Insertion”一文中提出的文本生成
    的頭像 發表于 07-23 16:56 ?1914次閱讀
    <b class='flag-5'>文本生成</b>任務中引入編輯方法的<b class='flag-5'>文本生成</b>

    受控文本生成模型的一般架構及故事生成任務等方面的具體應用

    來自:哈工大訊飛聯合實驗室 本期導讀:本文是對受控文本生成任務的一個簡單的介紹。首先,本文介紹了受控文本生成模型的一般架構,點明了受控文本生成模型的特點。然后,本文介紹了受控文本生成
    的頭像 發表于 10-13 09:46 ?3790次閱讀
    受控<b class='flag-5'>文本生成</b>模型的一般架構及故事<b class='flag-5'>生成</b>任務等方面的具體應用

    PID控制算法通俗理解.pdf

    PID控制算法通俗理解.pdf
    發表于 12-21 09:12 ?5次下載

    基于GPT-2進行文本生成

    文本生成是自然語言處理中一個重要的研究領域,具有廣闊的應用前景。國內外已經有諸如Automated Insights、Narrative Science以及“小南”機器人和“小明”機器人等文本生成
    的頭像 發表于 04-13 08:35 ?5120次閱讀

    基于VQVAE的長文本生成 利用離散code來建模文本篇章結構的方法

    寫在前面 近年來,多個大規模預訓練語言模型 GPT、BART、T5 等被提出,這些預訓練模型在自動文摘等多個文本生成任務上顯著優于非預訓練語言模型。但對于開放式生成任務,如故事生成、新聞生成
    的頭像 發表于 12-01 17:07 ?2057次閱讀

    ETH提出RecurrentGPT實現交互式超長文本生成

    RecurrentGPT 則另辟蹊徑,是利用大語言模型進行交互式長文本生成的首個成功實踐。它利用 ChatGPT 等大語言模型理解自然語言指令的能力,通過自然語言模擬了循環神經網絡(RNNs)的循環計算機制。
    的頭像 發表于 05-29 14:34 ?1087次閱讀
    ETH提出RecurrentGPT實現交互式超長<b class='flag-5'>文本生成</b>

    面向結構化數據的文本生成技術研究

    今天我們要講的文本生成是現在最流行的研究領域之一。文本生成的目標是讓計算機像人類一樣學會表達,目前看基本上接近實現。這些突然的技術涌現,使得計算機能夠撰寫出高質量的自然文本,滿足特定的需求。
    的頭像 發表于 06-26 14:39 ?892次閱讀
    面向結構化數據的<b class='flag-5'>文本生成</b>技術研究

    如何使用 Llama 3 進行文本生成

    使用LLaMA 3(Large Language Model Family of AI Alignment)進行文本生成,可以通過以下幾種方式實現,取決于你是否愿意在本地運行模型或者使用現成的API
    的頭像 發表于 10-27 14:21 ?955次閱讀