Semantic Kernel 的 Memory 有兩種實現,一個是 Semantic Kernel 內置的 Semantic Memory,一個是獨立的 Kernel Memory,Kernel Memory 是從 Semantic Kernel 進化而來。
關於 Semantic Memory 的介紹(來源):
Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory. The core library is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.
學習目標:通過 Semantic Memory 調用 OpenAI 的 api,使用 text-embedding-ada-002 模型生成文本的 embedding,保存在 in-memory 向量數據庫中,然後進行語義搜索。
學習材料:Semantic Kernel 源碼倉庫中的示例程序 Example14_SemanticMemory.cs
創建 .NET 控制枱項目
dotnet new console
dotnet add package Microsoft.SemanticKernel
dotnet add package --prerelease Microsoft.SemanticKernel.Plugins.Memory
創建 ISemanticTextMemory 實例
使用 MemoryBuilder 基於 OpenAITextEmbeddingGenerationService 創建 ISemanticTextMemory 的實例 SemanticTextMemory
#pragma warning disable SKEXP0011
#pragma warning disable SKEXP0003
#pragma warning disable SKEXP0052
ISemanticTextMemory memory = new MemoryBuilder()
.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey)
.WithMemoryStore(new VolatileMemoryStore())
.Build();
#pragma warning restore SKEXP0052
#pragma warning restore SKEXP0003
#pragma warning restore SKEXP0011
注:上面代碼中的 warning disable 是因為 MemoryBuilder 以及2個擴展方法都是 experimental feature
準備用户生成 Embedding 的文本數據
var sampleData = new Dictionary<string, string>
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function"
};
生成 Embedding 並保存至 in-memory 向量數據庫
var i = 0;
foreach (var entry in sampleData)
{
await memory.SaveReferenceAsync(
collection: "SKGitHub",
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
Console.Write($" #{++i} saved.");
}
在 SaveReferenceAsync 方法中調用了 IEmbeddingGenerationService 的 GenerateEmbeddingAsync 方法生成 embedding,詳見 SK 源碼 SemanticTextMemory.cs#L60
var embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, kernel, cancellationToken).ConfigureAwait(false);
注:embedding 值的類型是 ReadOnlyMemory<float>
我們這裏用的是 OpenAI,所以調用的是 OpenAITextEmbeddingGenerationService 的 GenerateEmbeddingsAsync 方法生成 embedding(詳見SK源碼),最終調用的是 Azure.AI.OpenAI.OpenAIClient 的 GetEmbeddingsAsync 方法,詳見 Azure SDK for .NET 的源碼 OpenAIClient.cs#L552
基於 Embedding 數據進行語義搜索
var query = "How do I get started?";
var memoryResults = memory.SearchAsync("SKGitHub", query, limit: 1, minRelevanceScore: 0.5);
在 SearchAsync 方法中也調用了 GenerateEmbeddingsAsync 方法基於查詢文本生成 embedding,詳見 SemanticTextMemory.cs#L108
輸出語義搜索的結果
await foreach (var memoryResult in memoryResults)
{
Console.Write($"Result:");
Console.Write(" URL: : " + memoryResult.Metadata.Id);
Console.Write(" Title : " + memoryResult.Metadata.Description);
Console.Write(" Relevance: " + memoryResult.Relevance);
}
運行控制枱程序
輸出結果:
#1 saved.
#2 saved.
Result:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8224089741706848
搜索成功,學習完成,完整示例代碼見 https://www.cnblogs.com/dudu/articles/18037216