-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add docs --------- Co-authored-by: Xudong Han <[email protected]>
- Loading branch information
Showing
13 changed files
with
369 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Contribute to Loki | ||
|
||
Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. This document outlines the process for contributing to the project. | ||
|
||
## How to Contribute | ||
|
||
We recommend a few best practices to make your contributions or reported errors easier to assist with. | ||
|
||
### For Pull Requests | ||
|
||
* PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution. | ||
* New features should have appropriate documentation added alongside them. | ||
* Aim for code maintainability, and minimize code copying. | ||
|
||
### For Feature Requests | ||
|
||
* Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported? | ||
|
||
### For Bug Reports | ||
|
||
* Provide a short description of the bug. | ||
* Provide a reproducible example--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it? | ||
* Provide a full error traceback of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context. | ||
* Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant. | ||
|
||
## Code Style | ||
|
||
Loki uses [black](https://github.com/psf/black) and [flake8](https://pypi.org/project/flake8/) to enforce code style, via [pre-commit](https://pre-commit.com/). Before submitting a pull request, please run the following commands to ensure your code is properly formatted: | ||
|
||
```bash | ||
pip install pre-commit | ||
pre-commit install | ||
``` | ||
|
||
## How Can I Get Involved? | ||
|
||
There are a number of distinct ways to contribute to Loki: | ||
|
||
* Implement new features or fix bugs by submitting a pull request: If you want to use a new model or retriever, or if you have an idea for a new feature, we would love to see your contributions. | ||
* We have our [development plan](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_plan.md) that outlines the roadmap for the project. If you are interested in contributing to any of the tasks, please join our [Discord](https://discord.gg/NRge6RS7) and direct message to @Haonan Li. | ||
|
||
We hope you find this project interesting and would like to contribute to it. If you have any questions, please feel free to reach out to us on our [Discord](https://discord.gg/NRge6RS7). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
## Development Plan | ||
|
||
As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on: | ||
|
||
## 1. Support for Multiple Models | ||
- **Broader Model Compatibility:** | ||
- Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini. | ||
- Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM. | ||
|
||
## 2. Model-specific Prompt Engineering | ||
- **Unit Testing for Prompts:** | ||
- Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios. | ||
|
||
## 3. Expanded Search Engine Support | ||
- **Diverse Search Engines:** | ||
- Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities. | ||
- Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine. | ||
- Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents. | ||
|
||
## 4. Deployment and Scalability | ||
- **Dockerization:** | ||
- Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments. | ||
|
||
## 5. Multi-language Support | ||
- **Language Expansion:** | ||
- Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base. | ||
|
||
|
||
We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# OpenFactVerification Documentation | ||
|
||
Welcome to the OpenFactVerification (Loki) documentation! This repository contains the codebase for the Loki project, which is a fact-checking pipeline that leverages state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular, allowing users to easily customize the evidence retrieval, language model, and prompt used in the fact-checking process. | ||
|
||
## Table of Contents | ||
|
||
* To learn about how to use the Loki pipeline, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md). | ||
|
||
* To learn how to add a new language model support, new search engine support, or new prompt support, please refer to the [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Release Log | ||
|
||
|
||
## v0.0.2 | ||
|
||
### New Features | ||
1. **API Key Handling:** Transitioned from creating key files via copying to dynamically reading all API keys from a YAML file, streamlining configuration processes. | ||
2. **Unified Configuration Dictionary:** Replaced platform-specific dictionaries with a unified dictionary that aligns with environmental variable naming conventions, enhancing consistency and maintainability. | ||
3. **Model Switching:** Introduced a `--model` parameter that allows switching between different models, currently supporting OpenAI and Anthropic. | ||
4. **Modular Architecture:** Restructured the codebase into one Base class file and individual class files for each model, enhancing modularity and clarity. | ||
5. **Base Class Redefinition:** Redefined the Base class to abstract asynchronous operations and other functionalities. Users customizing models need only override three functions. | ||
6. **Prompt Switching:** Added a `--prompt` parameter for switching between predefined prompts, initially supporting prompts for OpenAI and Anthropic. | ||
7. **Prompt Definitions via YAML and JSON:** Enabled prompt definitions using YAML and JSON, allowing prompts to be automatically read from corresponding YAML or JSON files when the prompt parameter ends with `.yaml` or `.json`. | ||
8. **Search Engine Switching:** Introduced a `--retriever` parameter to switch between different search engines, currently supporting Serper and Google. | ||
9. **Webapp Frontend Optimization:** Optimized the web application frontend to prevent duplicate requests during processing, including disabling the submit button after a click and displaying a timer during processing. | ||
10. **Client Switching:** introduce a `--client` parameter that allows switching between different client (chat API), currently support OpenAI compatible API (for local model and official model), and Anthropic chat API client. | ||
|
||
|
||
|
||
## v0.0.1 | ||
|
||
Initial release of Loki. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Loki Development Guide | ||
|
||
This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.2 and later. | ||
|
||
## Loki Framework Introduction | ||
|
||
Loki leverage state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular in `factcheck/core/`, which include the following components: | ||
|
||
- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis. | ||
- **Checkworthy:** Assesses each claim's potential significance, filtering out vague or ambiguous statements to focus on those that truly matter. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast." | ||
- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of truth. | ||
- **Evidence Retriever:** Ventures into the digital realm, retrieving relevant evidence that forms the foundation of informed verification. | ||
- **ClaimVerify:** Examines the gathered evidence, determining the veracity of each claim to uphold the integrity of information. | ||
|
||
To support each component's functionality, Loki relies on the following utils: | ||
- **Language Model:** Currently, 4 out of 5 components (including: Decomposer, Checkworthy, Query Generator, and ClaimVerify) use the language model (LLMs) to perform their tasks. The supported LLMs are defined in `factcheck/core/utils/llmclient/` and can be easily extended to support more LLMs. | ||
- **Prompt:** The prompt is a crucial part of the LLMs, and is usually optimized for each LLM to achieve the best performance. The prompt is defined in `factcheck/core/utils/prompt/` and can be easily extended to support more prompts. | ||
|
||
|
||
## New LLM Support | ||
|
||
A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a subclass of `BaseClient` from `factcheck/core/utils/llmclient/base.py`. The LLM should implement the `_call` method, which take a single string input and return a string output. | ||
|
||
> **_Note_:** | ||
> To ensure the sanity of the pipeline, the output of the LLM should be a compiled-code-based string, which can be directly parsed by python `eval` method. Usually, the output should be a `list` or `dict` in the form of a string. | ||
We find that ChatGPT [json_mode](https://platform.openai.com/docs/guides/text-generation/json-mode) is a good choice for the LLM, as it can generate structured output. | ||
To support a new LLM, you may need to implement a post-processing to convert the output of the LLM to a structured format. | ||
|
||
## New Search Engine (Retriever) Support | ||
|
||
Evidence retriever should be defined in `factcheck/core/Retriever/` and should be a subclass of `EvidenceRetriever` from `factcheck/core/Retriever/base.py`. The retriever should implement the `retrieve_evidence` method. | ||
|
||
## New Language Support | ||
|
||
To support a new language, you need to create a new file in `factcheck/utils/prompt/` with the name `<llm>_prompt_<language_iso>.py`. For example, to create a prompt suite for ChatGPT in Chinese, you can create a file named `chatgpt_prompt_zh.py`. | ||
|
||
The prompt file should contains a class which is a subclass of `BasePrompt` from `factcheck/core/utils/prompt/base.py`, and been registered in `factcheck/utils/prompt/__init__.py`. | ||
|
||
|
||
## Prompt Optimization | ||
|
||
To optimize the prompt for a specific LLM, you can modify the prompt in `factcheck/utils/prompt/`. We will release a minimal test suite to evaluate the performance of the prompt in the future. | ||
|
||
## Others |
Oops, something went wrong.