Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs #17

Merged
merged 6 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 32 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,6 @@
## Overview
Loki is our open-source solution designed to automate the process of verifying factuality. It provides a comprehensive pipeline for dissecting long texts into individual claims, assessing their worthiness for verification, generating queries for evidence search, crawling for evidence, and ultimately verifying the claims. This tool is especially useful for journalists, researchers, and anyone interested in the factuality of information. To stay updated, please subscribe to our newsletter at [our website](https://www.librai.tech/) or join us on [Discord](https://discord.gg/NRge6RS7)!

## Components
- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis.
- **Checkworthy:** Assesses each claim's potential significance, filtering out vague or ambiguous statements to focus on those that truly matter. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast."
- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of truth.
- **Evidence Crawler:** Ventures into the digital realm, retrieving relevant evidence that forms the foundation of informed verification.
- **ClaimVerify:** Examines the gathered evidence, determining the veracity of each claim to uphold the integrity of information.

## Quick Start

Expand Down Expand Up @@ -49,42 +43,19 @@ You can choose to export essential api key to the environment
```bash
export SERPER_API_KEY=... # this is required in evidence retrieval if serper being used
export OPENAI_API_KEY=... # this is required in all tasks
export ANTHROPIC_API_KEY=... # this is required only if you want to replace openai with anthropic
export LOCAL_API_KEY=... # this is required only if you want to use local LLM
export LOCAL_API_URL=... # this is required only if you want to use local LLM
```

Alternatively, you can save the api information in a yaml file with the same key names as the environment variables and pass the path to the yaml file as an argument to the `check_response` method.

See `demo_data\api_config.yaml` as an example of the api configuration file.
- Example: Pass the path to the api configuration file
```bash
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/api_config.yaml
```

### Test
Alternatively, you configure API keys via a YAML file, see [user guide](docs/user_guide.md) for more details.

A sample test case:
<p align="center"><img src="./fig/cmd_example.gif"/></p>

To test the project, you can run the `factcheck.py` script:
```bash
# String
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world"
# Text
python -m factcheck --modal text --input demo_data/text.txt
# Speech
python -m factcheck --modal speech --input demo_data/speech.mp3
# Image
python -m factcheck --modal image --input demo_data/image.webp
# Video
python -m factcheck --modal video --input demo_data/video.m4v
```

## Usage

The main interface of the Fact-check Pipeline is located in `factcheck/core/FactCheck.py`, which contains the `check_response` method. This method integrates the complete pipeline, where each functionality is encapsulated in its class as described in the Features section.
The main interface of Loki fact-checker located in `factcheck/__init__.py`, which contains the `check_response` method. This method integrates the complete fact verification pipeline, where each functionality is encapsulated in its class as described in the Features section.

#### Used as a Library

Example usage:
```python
from factcheck import FactCheck

Expand All @@ -98,35 +69,29 @@ results = factcheck_instance.check_response(text)
print(results)
```

Web app usage:
#### Used as a Web App
```bash
python webapp.py --api_config demo_data/api_config.yaml
```
<p align="center"><img src="./fig/web_input.png"/></p>
<p align="center"><img src="./fig/web_result.png"/></p>



## Customize Your Experience

### Custom Models
```bash
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/api_config.yaml --model claude-3-opus-20240229 --prompt claude_prompt
```

### Custom Evidence Retrieval
```bash
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/test_api_config.yaml --retriever google
```
#### Multimodal Usage

### Custom Prompts
```bash
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/test_api_config.yaml --prompt demo_data/sample_prompt.yaml
# String
python -m factcheck --modal string --input "MBZUAI is the first AI university in the world"
# Text
python -m factcheck --modal text --input demo_data/text.txt
# Speech
python -m factcheck --modal speech --input demo_data/speech.mp3
# Image
python -m factcheck --modal image --input demo_data/image.webp
# Video
python -m factcheck --modal video --input demo_data/video.m4v
```

## Contributing to Loki

Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our [Contribution Guidelines](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/CONTRIBUTING.md).
#### Customize Your Experience
For advanced usage, please see our [user guide](docs/user_guide.md).

## Ready for More?

Expand Down Expand Up @@ -166,27 +131,29 @@ Your support enables us to:
[TRY NOW!](https://aip.librai.tech/login)


## Stay Connected and Informed


Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community.

💌 Subscribe now at [our website](https://www.librai.tech/)!

### Contributing to Loki project

## License
This project is licensed under the [MIT license](LICENSE.md) - see the LICENSE file for details.
Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our [Contribution Guidelines](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/CONTRIBUTING.md).

## Acknowledgments
### Acknowledgments
- Special thanks to all contributors who have helped in shaping this project.

<!---
add slack channel here
-->


### Stay Connected and Informed

Don’t miss out on the latest updates, feature releases, and community insights! We invite you to subscribe to our newsletter and become a part of our growing community.

💌 Subscribe now at [our website](https://www.librai.tech/)!



## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Libr-AI/OpenFactVerification&type=Date)](https://star-history.com/#Libr-AI/OpenFactVerification&Date)
> [![Star History Chart](https://api.star-history.com/svg?repos=Libr-AI/OpenFactVerification&type=Date)](https://star-history.com/#Libr-AI/OpenFactVerification&Date)

## Cite as
```
Expand Down
42 changes: 0 additions & 42 deletions docs/CONTRIBUTING.md

This file was deleted.

29 changes: 0 additions & 29 deletions docs/DEVELOPMENT_PLAN.md

This file was deleted.

45 changes: 42 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,47 @@

Welcome to the OpenFactVerification (Loki) documentation! This repository contains the codebase for the Loki project, which is a fact-checking pipeline that leverages state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular, allowing users to easily customize the evidence retrieval, language model, and prompt used in the fact-checking process.

## Table of Contents
## Related Documents

* To learn about how to use the Loki pipeline, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md).
* For users who want to try advanced features, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md).

* To learn how to add a new language model support, new search engine support, or new prompt support, please refer to the [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md).
* For developers who want to contribute to the project, please go to the [How-to-contribute](#how-to-contribute) section, and also [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md).


## How to Contribute
We welcome contributions and feedback from the community and recommend a few best practices to make your contributions or reported errors easier to assist with.

### For Pull Requests

* PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution.
* New features should have appropriate documentation added alongside them.
* Aim for code maintainability, and minimize code copying.

### For Feature Requests

* Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported?

### For Bug Reports

* Provide a short description of the bug.
* Provide a reproducible example--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it?
* Provide a full error traceback of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context.
* Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant.

## Code Style

Loki uses [black](https://github.com/psf/black) and [flake8](https://pypi.org/project/flake8/) to enforce code style, via [pre-commit](https://pre-commit.com/). Before submitting a pull request, please run the following commands to ensure your code is properly formatted:

```bash
pip install pre-commit
pre-commit install
```

## How Can I Get Involved?

There are a number of distinct ways to contribute to Loki:

* Implement new features or fix bugs by submitting a pull request: If you want to use a new model or retriever, or if you have an idea for a new feature, we would love to see your contributions.
* We have our [development plan](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_plan.md) that outlines the roadmap for the project. If you are interested in contributing to any of the tasks, please join our [Discord](https://discord.gg/NRge6RS7) and direct message to @Haonan Li.

We hope you find this project interesting and would like to contribute to it. If you have any questions, please feel free to reach out to us on our [Discord](https://discord.gg/NRge6RS7).
50 changes: 42 additions & 8 deletions docs/development_guide.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
# Loki Development Guide
# Development Guide

This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.2 and later.

## Loki Framework Introduction
- [Development Guide](#development-guide)
- [Framework Introduction](#framework-introduction)
- [Development Plan](#development-plan)


## Framework Introduction

Loki leverage state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular in `factcheck/core/`, which include the following components:

Expand All @@ -16,8 +21,7 @@ To support each component's functionality, Loki relies on the following utils:
- **Language Model:** Currently, 4 out of 5 components (including: Decomposer, Checkworthy, Query Generator, and ClaimVerify) use the language model (LLMs) to perform their tasks. The supported LLMs are defined in `factcheck/core/utils/llmclient/` and can be easily extended to support more LLMs.
- **Prompt:** The prompt is a crucial part of the LLMs, and is usually optimized for each LLM to achieve the best performance. The prompt is defined in `factcheck/core/utils/prompt/` and can be easily extended to support more prompts.


## New LLM Support
### Support a New LLM Client

A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a subclass of `BaseClient` from `factcheck/core/utils/llmclient/base.py`. The LLM should implement the `_call` method, which take a single string input and return a string output.

Expand All @@ -27,19 +31,49 @@ A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a
We find that ChatGPT [json_mode](https://platform.openai.com/docs/guides/text-generation/json-mode) is a good choice for the LLM, as it can generate structured output.
To support a new LLM, you may need to implement a post-processing to convert the output of the LLM to a structured format.

## New Search Engine (Retriever) Support
### Support a New Search Engine (Retriever)

Evidence retriever should be defined in `factcheck/core/Retriever/` and should be a subclass of `EvidenceRetriever` from `factcheck/core/Retriever/base.py`. The retriever should implement the `retrieve_evidence` method.

## New Language Support
### Support a New Language

To support a new language, you need to create a new file in `factcheck/utils/prompt/` with the name `<llm>_prompt_<language_iso>.py`. For example, to create a prompt suite for ChatGPT in Chinese, you can create a file named `chatgpt_prompt_zh.py`.

The prompt file should contains a class which is a subclass of `BasePrompt` from `factcheck/core/utils/prompt/base.py`, and been registered in `factcheck/utils/prompt/__init__.py`.


## Prompt Optimization
### Prompt Optimization

To optimize the prompt for a specific LLM, you can modify the prompt in `factcheck/utils/prompt/`. We will release a minimal test suite to evaluate the performance of the prompt in the future.

## Others


## Development Plan

As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on:

### 1. Support for Multiple Models
- **Broader Model Compatibility:**
- Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini.
- Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM.

### 2. Model-specific Prompt Engineering
- **Unit Testing for Prompts:**
- Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios.

### 3. Expanded Search Engine Support
- **Diverse Search Engines:**
- Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities.
- Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine.
- Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents.

### 4. Deployment and Scalability
- **Dockerization:**
- Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments.

### 5. Multi-language Support
- **Language Expansion:**
- Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base.


We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments!
4 changes: 3 additions & 1 deletion docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,13 +118,15 @@ text = "Your text here"
results = factcheck_instance.check_response(text)
print(results)
```

### Used as a Web App

```bash
python webapp.py --api_config demo_data/api_config.yaml
```

<p align="center"><img src="../fig/web_input.png"/></p>
<p align="center"><img src="../fig/web_result.png"/></p>

## Advanced Features

### Multimodality
Expand Down
Loading