Let Taotie be your helper to extract useful knowledge out of massive noisy information. It consists of three main components:
- Sources: These are the information sources that TaoTie can subscribe to. Currently, TaoTie supports Twitter, GitHub, arXiv, and HTTP sources.
- Consumers: These are the agents that TaoTie uses to summarize the information. TaoTie can be integrated with any Language Model (LLM) agent, and only a thin wrapper is needed to integrate the agent with TaoTie.
- Storage: This is where TaoTie stores the summarized information. Currently, TaoTie supports Notion, but it can be configured to use other storage solutions as well.
Here's an overview of TaoTie's architecture:
Here's an example of how to use TaoTie to subscribe to Twitter, GitHub, and HTTP sources, summarize the information using an LLM agent, and store the summaries in Notion.
The example code can be found in examples/summarize_to_notion/example.py.
A website backed by Taotie can be seen from https://techtao.super.site/.
The blog website backed by TaoTie
Create a .env file and add the necessary API tokens:
OPENAI_API_KEY=<your OpenAI API key>
# Please follow https://developers.notion.com/docs/create-a-notion-integration.
NOTION_TOKEN=<your Notion API token>
# The id of the page where you want to dump the summary.
NOTION_ROOT_PAGE_ID=<the ID of the page where you want to store the summaries>
# (Optional) Please follow https://developer.twitter.com/en/portal.
TWITTER_BEARER_TOKEN=<your Twitter bearer token>
# (Optional) The list of authors whose papers you care about.
ARXIV_AUTHORS=Yann LeCun,Kaiming He,Ross Girshick,Piotr Dollár,Alec Radford,Ilya Sutskever,Dario Amodei,Geoffrey E. Hinton
At the root of the repository, run the following command:
# Build the docker image via docker-compose
docker-compose -f examples/summarize_to_notion/docker-compose.yml up
When the program runs, it will subscribe to Twitter, GitHub, and HTTP sources, summarize the information using an LLM agent, and store the summaries in Notion. It will also set up an HTTP server listening on port 6543 to receive ad-hoc summarization requests. For example, you can use the following curl command to summarize a blog post:
curl -X POST -H "Content-Type: application/json" -d '{"url": "https://www.harmdevries.com/post/model-size-vs-compute-overhead"}' http://localhost:6543/api/v1/url
A more user friendly tool is not yet available. But you can use the Postman to send the request.
Note: Please remember to stop the container after a while. Otherwise, your OPENAI bill will grow continously.
Output of the info summarizer example
In your notion, you can see the contents added.
Summarized Web-page (Medium post)
Click the entry can show the details, including the knowledge graph summarized for this piece of information.
Summarized Github-repo (Github Trends)
The --data-sources flag allows you to specify the data sources to be used. It accepts a comma-separated list of data sources. The possible values are "http_service", "github", "arxiv", and "twitter".
- Tools to generate the report based on the gathered data in notion database.
python taotie/tools.py report --date-lookback 2 --type-filter arxiv,blog
python taotie/tools.py report --date-lookback 2 --type-filter github-repo
docker rm $(docker ps -a -q) ; docker images | grep '<none>' | awk '{print $3}' | xargs docker rmi