Skip to content

Let taotie keep an eye for you on the information you care.

License

Notifications You must be signed in to change notification settings

small-thinking/taotie

Repository files navigation

drawing

Tao Tie (饕餮)

Let Taotie be your helper to extract useful knowledge out of massive noisy information. It consists of three main components:

  • Sources: These are the information sources that TaoTie can subscribe to. Currently, TaoTie supports Twitter, GitHub, arXiv, and HTTP sources.
  • Consumers: These are the agents that TaoTie uses to summarize the information. TaoTie can be integrated with any Language Model (LLM) agent, and only a thin wrapper is needed to integrate the agent with TaoTie.
  • Storage: This is where TaoTie stores the summarized information. Currently, TaoTie supports Notion, but it can be configured to use other storage solutions as well.

Architecture

Here's an overview of TaoTie's architecture:

drawing
The architecture of TaoTie

Example

Here's an example of how to use TaoTie to subscribe to Twitter, GitHub, and HTTP sources, summarize the information using an LLM agent, and store the summaries in Notion.

The example code can be found in examples/summarize_to_notion/example.py.

A website backed by Taotie can be seen from https://techtao.super.site/.

drawing
The blog website backed by TaoTie

1. Set up your environment

Create a .env file and add the necessary API tokens:

OPENAI_API_KEY=<your OpenAI API key>
# Please follow https://developers.notion.com/docs/create-a-notion-integration.
NOTION_TOKEN=<your Notion API token>  
# The id of the page where you want to dump the summary.
NOTION_ROOT_PAGE_ID=<the ID of the page where you want to store the summaries>

# (Optional) Please follow https://developer.twitter.com/en/portal.
TWITTER_BEARER_TOKEN=<your Twitter bearer token>  

# (Optional) The list of authors whose papers you care about.
ARXIV_AUTHORS=Yann LeCun,Kaiming He,Ross Girshick,Piotr Dollár,Alec Radford,Ilya Sutskever,Dario Amodei,Geoffrey E. Hinton

2. Build and run the example:

At the root of the repository, run the following command:

# Build the docker image via docker-compose
docker-compose -f examples/summarize_to_notion/docker-compose.yml up

When the program runs, it will subscribe to Twitter, GitHub, and HTTP sources, summarize the information using an LLM agent, and store the summaries in Notion. It will also set up an HTTP server listening on port 6543 to receive ad-hoc summarization requests. For example, you can use the following curl command to summarize a blog post:

curl -X POST -H "Content-Type: application/json" -d '{"url": "https://www.harmdevries.com/post/model-size-vs-compute-overhead"}' http://localhost:6543/api/v1/url

A more user friendly tool is not yet available. But you can use the Postman to send the request.

Note: Please remember to stop the container after a while. Otherwise, your OPENAI bill will grow continously.

Output

drawing
Output of the info summarizer example

In your notion, you can see the contents added.

drawing
Ad-hoc bookmarking

drawing
Summarized Web-page (Medium post)

drawing
Subscribed Github Trending

Click the entry can show the details, including the knowledge graph summarized for this piece of information.

drawing
Summarized Github-repo (Github Trends) The --data-sources flag allows you to specify the data sources to be used. It accepts a comma-separated list of data sources. The possible values are "http_service", "github", "arxiv", and "twitter".

Tools

Tools

  1. Tools to generate the report based on the gathered data in notion database.
python taotie/tools.py report --date-lookback 2 --type-filter arxiv,blog
python taotie/tools.py report --date-lookback 2 --type-filter github-repo

drawing
Example Report

Clean up docker images not used

docker rm $(docker ps -a -q) ; docker images | grep '<none>' | awk '{print $3}' | xargs docker rmi

About

Let taotie keep an eye for you on the information you care.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published