환경 세팅
conda로 파이썬 환경을 만들어 줍니다.
conda create -n ragex python=3.9
conda activate ragex
dotenv를 사용해서 api키를 관리할 것이므로. env 파일도 만들어주고 api key 적어줍니다.
echo "OPENAI_API_KEY='{your_api_key}'" >> .env
다운로드
pip install python-dotenv langchain langchain-openai
LLM 부르기
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
load_dotenv()
llm = ChatOpenAI()
llm에게 무언가 물어봅시다. “langsmith가 어떻게 쓰이는 거야?”
llm.invoke("how can langsmith help with testing?")
# AIMessage(content='Langsmith can help with testing in a few ways:\\n\\n1. Automated testing: Langsmith can generate test cases and automate the execution of tests to ensure that the code is functioning as expected.\\n\\n2. Code analysis: Langsmith can analyze the code to identify potential bugs or areas of improvement, helping testers focus their efforts on high-risk areas.\\n\\n3. Test case generation: Langsmith can generate test cases based on the code structure and specifications, helping testers quickly create comprehensive test suites.\\n\\n4. Test coverage analysis: Langsmith can analyze test coverage to ensure that all parts of the code are being tested, helping testers identify gaps in their testing strategy.\\n\\nOverall, Langsmith can assist testers in improving the efficiency and effectiveness of their testing process, ultimately leading to higher quality software.', response_metadata={'token_usage': {'completion_tokens': 154, 'prompt_tokens': 15, 'total_tokens': 169}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-0e8c21b3-529e-4c95-ba02-1bfbec6cb5c5-0')
프롬프트 탬플릿을 사용해서 물어볼 수 있습니다. 위처럼 단순히 물어보는 것보다 더 구체적인 질문 방법입니다.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a world class technical documentation writer."),
("user", "{input}")
])
이제 둘을 결합하여 하나의 간단한 체인(chain)을 만들 수 있습니다.
chain = prompt | llm # <input> -> prompt -> llm -> <output>
체인을 실행해 봅시다.
chain.invoke({"input": "how can langsmith help with testing?"})
# AIMessage(content='Langsmith is a powerful tool that can greatly assist with testing in various ways. Here are some ways in which Langsmith can help with testing:\\n\\n1. **Automation**: Langsmith can automate the testing process by generating test cases and executing them automatically. This can help in reducing manual effort and increasing the speed and efficiency of testing.\\n\\n2. **Code Analysis**: Langsmith can analyze code to identify potential bugs, vulnerabilities, or performance issues. This can help testers in identifying areas that need special attention during testing.\\n\\n3. **Test Case Generation**: Langsmith can generate test cases based on the code structure and logic. This can help in ensuring comprehensive test coverage and identifying edge cases that might be missed during manual test case creation.\\n\\n4. **Regression Testing**: Langsmith can help in automating regression testing by re-running test cases on code changes to ensure that new modifications do not introduce any new bugs or issues.\\n\\n5. **Integration Testing**: Langsmith can assist in integration testing by analyzing the interactions between different components of the system and generating test cases to validate the integrations.\\n\\n6. **Performance Testing**: Langsmith can analyze code for performance bottlenecks and generate test cases to simulate high load scenarios. This can help in identifying performance issues early in the development cycle.\\n\\nOverall, Langsmith can be a valuable tool in the testing process by automating repetitive tasks, ensuring comprehensive test coverage, and helping in identifying and addressing potential issues early in the development cycle.', response_metadata={'token_usage': {'completion_tokens': 292, 'prompt_tokens': 28, 'total_tokens': 320}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-f9fd80ba-7052-46da-b736-e2fc174dce11-0')
답변이 AIMessage로 출력됩니다. 단순히 string으로 출력되는 것이 더 보기 편합니다. 파서를 사용해서 Message를 string으로 바꿔봅시다.
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()
아까 만든 체인에 파서를 결합해 봅시다.
chain = chain | output_parser
# chain = prompt | llm | output_parser
체인을 실행해 봅시다. string으로 잘 바뀌어 나오는 것을 확인할 수 있습니다.
chain.invoke({"input": "how can langsmith help with testing?"})
# "Langsmith can help with testing in various ways. Here are some ways in which Langsmith can assist with testing:\\n\\n1. **Automated Testing**: Langsmith can be used to automate testing tasks such as unit testing, integration testing, and end-to-end testing. By writing scripts in Langsmith, you can automate repetitive testing tasks and ensure consistent testing results across different environments.\\n\\n2. **Data Generation**: Langsmith can be used to generate test data for your applications. By writing data generation scripts in Langsmith, you can create a variety of test scenarios and ensure comprehensive test coverage.\\n\\n3. **Load Testing**: Langsmith can be used to perform load testing on your applications. By simulating multiple users and generating high traffic volumes, Langsmith can help you identify performance bottlenecks and optimize your application's performance under heavy load.\\n\\n4. **API Testing**: Langsmith can be used to test APIs by sending HTTP requests and validating responses. By writing API testing scripts in Langsmith, you can ensure that your APIs are functioning correctly and returning the expected results.\\n\\n5. **Browser Automation**: Langsmith can be used for browser automation testing. By writing scripts to interact with web applications, Langsmith can help you test the functionality and user experience of your web applications across different browsers.\\n\\nOverall, Langsmith can streamline your testing process, increase test coverage, and improve the quality of your applications by automating testing tasks and generating test data."
Retrieval Chain 만들기
llm이 langsmith에 대한 내용을 전부 알고 있지는 않습니다. 그래서 llm에게 추가적인 정보를 건네줘야 더 정확한 답변을 할 수 있습니다.
이 튜토리얼에서는 langsmith에 대한 웹페이지의 내용을 가져와서 llm에게 건네줄 것입니다.
먼저 html의 파싱에 필요한 bs4를 다운로드합니다.
pip install beautifulsoup4
그다음 web에서 문서를 가져옵니다.
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("<https://docs.smith.langchain.com/user_guide>")
docs = loader.load()
이 문서는 벡터 저장소에 저장해야 사용할 수 있습니다. 벡터 저장소에 저장하기 위해선 문서를 벡터로 바꾸는 작업이 필요하고 이때 embedding 모델이 필요합니다.
위에서 사용한 openai api key를 그대로 사용해서 임베딩을 진행할 수 있습니다.
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
벡터 저장소를 설치해 줍시다. 여기선 faiss를 사용합니다.
pip install faiss-cpu
이제 문서를 벡터저장소에 저장해 봅시다. 문서를 분리하고 임베딩하여 저장합니다.
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)
이제 벡터저장소가 완성되었으니 해당 벡터저장소에서 문서를 가져와서 llm에 입력해 주는 retrieval chain을 만들어 봅시다.
from langchain.chains.combine_documents import create_stuff_documents_chain
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")
document_chain = create_stuff_documents_chain(llm, prompt)
우리는 여기서 문서를 직접적으로 chain에 입력할 수도 있습니다.
from langchain_core.documents import Document
document_chain.invoke({
"input": "how can langsmith help with testing?",
"context": [Document(page_content="langsmith can let you visualize test results")]
})
# 'Langsmith can help visualize test results.'
하지만 우리가 원했던 것은 이게 아닙니다. 주어진 질문에 대해 가장 관련 있는 문서를 retriever가 llm에게 제공하는 것을 생각했습니다. 그러기 위해선 retriever을 정의해야 합니다.
from langchain.chains import create_retrieval_chain
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)
이제 invoke를 다시 사용해서 문서를 참고한 llm의 대답을 들어봅시다.
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])
# LangSmith can help with testing by allowing developers to create datasets, run tests on LLM applications, upload test cases in bulk, create custom evaluations, and run evaluations to score test results. Additionally, LangSmith provides a comparison view to track and diagnose regressions in test scores across multiple revisions of an application, a playground environment for rapid iteration and experimentation, and the ability to add runs as examples to datasets to expand test coverage on real-world scenarios.
대화할 수 있는 Retrieval Chain 만들기
여기까지는 단순 하나의 질문만 가능했습니다. 하지만 여러 lllm 애플리케이션에서는 chatbot과의 대화를 지원합니다.
create_retrieval_chain을 그대로 사용할 수 있지만 두 가지 정도 수정이 필요합니다.
- 문서를 검색하는 방법이 전체 대화 문맥을 보고 결정되어야 합니다.
- 최종적인 답변 또한 전체 대화 문맥을 보고 결정되어야 합니다.
retriever_chain에서는 이전의 대화와 현재 질문을 보고 문서를 찾을 query를 생성합니다.
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
# First we need a prompt that we can pass into an LLM to generate this search query
prompt = ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
("user", "Given the above conversation, generate a search query to look up to get information relevant to the conversation")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)
chatbot과의 대화 상황을 구성해 보았습니다.
from langchain_core.messages import HumanMessage, AIMessage
chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
retriever_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})
'''
[Document(page_content='Skip to main contentLangSmith API DocsSearchGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookUser GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they’re just starting their journey.Prototyping\\u200bPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\\nThe ability to rapidly understand how the model is performing — and debug where it is failing — is incredibly important for this phase.Debugging\\u200bWhen developing new LLM applications, we suggest having LangSmith tracing enabled by default.\\nOftentimes, it isn’t necessary to look at every single trace. However, when things go wrong (an unexpected end result, infinite agent loop, slower than expected execution, higher than expected token usage), it’s extremely helpful to debug by looking through the application traces. LangSmith gives clear visibility and debugging information at each step of an LLM sequence, making it much easier to identify and root-cause issues.\\nWe provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set\\u200bWhile many developers still ship an initial version of their application based on “vibe checks”, we’ve seen an increasing number of engineering teams start to adopt a more test driven approach. LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications.\\nThese test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results.Comparison View\\u200bWhen prototyping different versions of your applications and making changes, it’s important to see whether or not you’ve regressed with respect to your initial test cases.\\nOftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your application.\\nIn order to get a sense for which variant is performing better, it’s useful to be able to view results for different configurations on the same datapoints side-by-side. We’ve invested heavily in a user-friendly comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of your application.Playground\\u200bLangSmith provides a playground environment for rapid iteration and experimentation.\\nThis allows you to quickly test out different prompts and models. You can open the playground from any prompt or model run in your trace.', metadata={'source': '<https://docs.smith.langchain.com/user_guide>', 'title': 'LangSmith User Guide | 🦜️🛠️ LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they’re just starting their journey.', 'language': 'en'}),
Document(page_content='LangSmith User Guide | 🦜️🛠️ LangSmith', metadata={'source': '<https://docs.smith.langchain.com/user_guide>', 'title': 'LangSmith User Guide | 🦜️🛠️ LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they’re just starting their journey.', 'language': 'en'}),
Document(page_content="Every playground run is logged in the system and can be used to create test cases or compare with other runs.Beta Testing\\u200bBeta testing allows developers to collect more data on how their LLM applications are performing in real-world scenarios. In this phase, it’s important to develop an understanding for the types of inputs the app is performing well or poorly on and how exactly it’s breaking down in those cases. Both feedback collection and run annotation are critical for this workflow. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations.Capturing Feedback\\u200bWhen launching your application to an initial set of users, it’s important to gather human feedback on the responses it’s producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses. LangSmith allows you to attach feedback scores to logged traces (oftentimes, this is hooked up to a feedback button in your app), then filter on traces that have a specific feedback tag and score. A common workflow is to filter on traces that receive a poor user feedback score, then drill down into problematic points using the detailed trace view.Annotating Traces\\u200bLangSmith also supports sending runs to annotation queues, which allow annotators to closely inspect interesting traces and annotate them with respect to different criteria. Annotators can be PMs, engineers, or even subject matter experts. This allows users to catch regressions across important evaluation criteria.Adding Runs to a Dataset\\u200bAs your application progresses through the beta testing phase, it's essential to continue collecting data to refine and improve its performance. LangSmith enables you to add runs as examples to datasets (from both the project page and within an annotation queue), expanding your test coverage on real-world scenarios. This is a key benefit in having your logging system and your evaluation/testing system in the same platform.Production\\u200bClosely inspecting key data points, growing benchmarking datasets, annotating traces, and drilling down into important data in trace view are workflows you’ll also want to do once your app hits production.However, especially at the production stage, it’s crucial to get a high-level overview of application performance with respect to latency, cost, and feedback scores. This ensures that it's delivering desirable results at scale.Online evaluations and automations allow you to process and score production traces in near real-time.Additionally, threads provide a seamless way to group traces from a single conversation, making it easier to track the performance of your application across multiple turns.Monitoring and A/B Testing\\u200bLangSmith provides monitoring charts that allow you to track key metrics over time. You can expand to view metrics for a given period and drill down into a specific data point to get a trace table for that time period — this is especially handy for debugging production issues.LangSmith also allows for tag and metadata grouping, which allows users to mark different versions of their applications with different identifiers and view how they are performing side-by-side within each chart. This is helpful for A/B testing changes in prompt, model, or retrieval strategy.Automations\\u200bAutomations are a powerful feature in LangSmith that allow you to perform actions on traces in near real-time. This can be used to automatically score traces, send them to annotation queues, or send them to datasets.To define an automation, simply provide a filter condition, a sampling rate, and an action to perform. Automations are particularly helpful for processing traces at production scale.Threads\\u200bMany LLM applications are multi-turn, meaning that they involve a series of interactions between the user and the application. LangSmith provides a threads view that groups traces from a single conversation together, making it easier to", metadata={'source': '<https://docs.smith.langchain.com/user_guide>', 'title': 'LangSmith User Guide | 🦜️🛠️ LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they’re just starting their journey.', 'language': 'en'}),
Document(page_content='meaning that they involve a series of interactions between the user and the application. LangSmith provides a threads view that groups traces from a single conversation together, making it easier to track the performance of and annotate your application across multiple turns.Was this page helpful?PreviousQuick StartNextOverviewPrototypingBeta TestingProductionCommunityDiscordTwitterGitHubDocs CodeLangSmith SDKPythonJS/TSMoreHomepageBlogLangChain Python DocsLangChain JS/TS DocsCopyright © 2024 LangChain, Inc.', metadata={'source': '<https://docs.smith.langchain.com/user_guide>', 'title': 'LangSmith User Guide | 🦜️🛠️ LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we’ll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they’re just starting their journey.', 'language': 'en'})]
'''
최종 답변을 만들어줄 retrieval_chain(위 retriever과 다릅니다.)도 이전 대화를 참고하도록 바꿔줍니다.
prompt = ChatPromptTemplate.from_messages([
("system", "Answer the user's questions based on the below context:\\n\\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)
이제 준비가 끝났습니다. 답변을 확인해 봅시다.
chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]
response = retrieval_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})
print(response["answer"])
# LangSmith can help test your LLM applications by allowing you to create datasets for running tests on your applications. These datasets contain inputs and reference outputs that can be used to evaluate the performance of your LLM models. Additionally, LangSmith provides the ability to run custom evaluations, both LLM-based and heuristic-based, to score the test results. This comprehensive testing functionality enables developers to assess the performance of their applications and make improvements as needed.
'AI&ML > 학습 정리' 카테고리의 다른 글
[Paper-review]RAGAS: Automated Evaluation of Retrieval Augmented Generation (0) | 2024.07.02 |
---|---|
[Paper-review]Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (0) | 2024.04.10 |
핸즈온 머신러닝 리뷰 (1) | 2024.01.13 |
f1-score에 대해서 알아보자(2) (0) | 2023.12.07 |
f1-score에 대해서 알아보자(1) (0) | 2023.12.07 |