ATLAS_IQ is a Python-based implementation of a dynamic, self-generating knowledge repository using graph automata and large language models (LLMs). This system is based on the concept described in the paper "Prompt Crawling: Using Graph Automata to Enable Autopoiesis and Self-Generation in Knowledge Repositories" by R.J. Cordes.
- ATLAS_IQ
- Table of Contents
- Features
- System Architecture
- Installation
- Usage
- Running the Public Health Wiki Example
- System Architecture
- Core Components
- Data Management
- Resource Handling
- Utility Modules
- Asynchronous Operations
- Error Handling and Resilience
- Configuration Management
- Advanced Features
- System Workflow
- Scalability and Performance Considerations
- Security Considerations
- Testing and Validation
- Future Enhancements
- Contributing
- License
- Dynamic knowledge graph generation and management
- Integration with OpenAI's GPT models for natural language processing
- Asynchronous operations for improved performance
- Graph-based data model using Neo4j
- Autopoiesis and self-generation of new knowledge entities
- Dynamic refactoring of the knowledge structure
- Authority smoothing for balanced entity importance
- Flexible resource handling (LLMs, APIs, databases)
- Configurable system parameters
The following diagram illustrates the high-level architecture of the ATLAS_IQ system:
+------------------+ +---------------------+
| ATLAS |<---manages/triggers---| Global Update Cycle |<--+
+------------------+ +---------------------+ |
| |
| triggers Local Update Cycles |
v |
+-------------------+ |
| Entities |<-----------------------------------------------+
| (e.g., Concepts, | |
| Scenarios, etc.) | |
+---------+---------+ |
| |
| perform Local Update Cycles |
v |
+-------------------+ +------------------+ |
| Patterns |----use---->| iQueries | |
| (with Conditional | +------------------+ |
| Type Assertions) | |
+-------------------+ |
| |
| influence behavior via |
| Conditional Type Assertions |
v |
+-------------------+ |
| Resource Handlers|<---process iQueries---+ |
+-------------------+ | |
| \ |
| interact with +------------------+ |
v | External Sources| |
+-------------------+ (LLMs, APIs, | (LLMs, APIs, | |
| Responses from |<-----Databases, etc.)--| Databases, etc.)| |
| External Sources | +-----------------+| |
+-------------------+ |
| |
| update Entities / create New Entities |
v |
+-------------------+ |
| Entities |-----feedback loop to Local Update Cycles-------+
| (Newly Created) |
+---------+---------+
|
| may execute Functions influencing behavior
v
+-------------------+
| Functions |
| - Semantic Crawl |
| - Dynamic Refactor|
| - Auth. Smoothing |
+-------------------+
- Clone the repository:
git clone https://github.com/BlockScience/atlas_iq.git
cd atlas_iq
- Set up a Python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Set up Neo4j using Docker Compose:
The project includes a docker-compose.yml
file. To start Neo4j, run:
docker-compose up -d
This will start a Neo4j 5.11.0 instance with the following configuration:
- Web interface available at http://localhost:7474
- Bolt port: 7687
- Initial password: password (you should change this in production)
- Memory settings: 1G for page cache, 1G for heap (initial and max)
-
Obtain an OpenAI API key from OpenAI's website.
-
Create a
.env
file in the project root and add your configuration:
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_HOST=localhost
NEO4J_PORT=7687
OPENAI_API_KEY=<your_openai_api_key>
ATLAS_UPDATE_INTERVAL=60
- You're now ready to run the ATLAS_IQ system!
- Initialize the ATLAS system:
from atlas.core.atlas import ATLAS
atlas = ATLAS()
- Create initial entities and patterns:
from atlas.core.entity import Entity
from atlas.core.pattern import Pattern
from atlas.core.iquery import iQuery
# Create a pattern
pattern = Pattern("example_pattern")
iquery = iQuery("example_query", "example_attribute", [your_resource_handler])
pattern.add_iquery(iquery)
# Create an entity
entity = Entity("example_entity", patterns=[pattern])
- Run the ATLAS system:
atlas.run()
This will start the global update cycle, which will continually update entities, generate new knowledge, and refactor the knowledge structure as needed.
Certainly! I'll replace the content in the README.md file with instructions for running the simulation.py script instead. Here's the updated section:
To run the public health wiki simulation:
-
Ensure you have set up the environment as described in the Installation section.
-
Navigate to the
examples/public_health_wiki
directory:
cd examples/public_health_wiki
- Run the simulation script:
python simulation.py
This will start the simulation, which includes:
- Initializing the ATLAS system
- Creating seed entities and patterns for public health domains
- Running initial update cycles
- Introducing simulation scenarios (e.g., COVID-19 Pandemic, Climate Change Health Impact)
- Performing graph analysis after each scenario
- Displaying updated knowledge base information
The simulation will demonstrate how the ATLAS system dynamically generates and updates knowledge about various public health domains and scenarios.
You can modify the simulation.py
script to adjust parameters, add new scenarios, or change the behavior of the simulation as needed.
The PromptCrawling implementation is built as a modular, asynchronous system with a focus on flexibility and extensibility. At its core, it uses a graph-based data model to represent knowledge entities and their relationships. The system is designed to continuously evolve and expand its knowledge base through interactions with various information sources, primarily Large Language Models (LLMs).
Key architectural features:
- Modular design with clear separation of concerns
- Asynchronous operations for improved performance
- Graph-based data model for complex relationships
- Integration with external resources (LLMs, APIs, databases)
- Configurable and extensible components
The system is primarily implemented in Python, leveraging asyncio for concurrent operations and Neo4j as the underlying graph database.
The ATLAS class (atlas/core/atlas.py) serves as the central coordinator of the entire system. It manages the global state, orchestrates update cycles, and oversees the creation and management of entities.
Key features of ATLAS:
- Singleton pattern implementation ensures a single instance
- Manages global update cycles (global_update_cycle method)
- Handles entity registration and unregistration
- Coordinates dynamic refactoring and autopoiesis processes
- Performs graph analysis and authority smoothing
Notable methods:
global_update_cycle()
: The heart of the system, continuously updating entitiestrigger_dynamic_refactor()
: Initiates the refactoring process for entitiesmanage_autopoiesis()
: Manages the self-generation of new entitiesperform_graph_analysis()
: Analyzes the graph structure using NetworkXsmooth_authority()
: Implements the authority smoothing algorithm
The Entity class (atlas/core/entity.py) represents individual nodes in the knowledge graph. Each entity has attributes, patterns, and associated iQueries.
Key features of Entity:
- Unique identifier (entity_id)
- Dictionary of attributes
- List of associated patterns
- List of iQueries derived from patterns
- Methods for local updates and attribute management
Notable methods:
local_update(global_state)
: Updates the entity based on its iQueriesadd_pattern(pattern)
: Associates a new pattern with the entityadd_attribute(key, value)
: Adds or updates an entity attributecheck_and_generate_new_entities(global_state)
: Generates new entities based on iQuery responses
The Pattern class (atlas/core/pattern.py) defines a collection of iQueries that can be applied to entities. Patterns can inherit from other patterns, allowing for a hierarchical structure of knowledge representation.
Key features of Pattern:
- Unique name
- List of associated iQueries
- List of parent patterns for inheritance
- Methods for managing iQueries and inheritance
Notable methods:
get_iqueries()
: Returns all iQueries, including inherited onesadd_iquery(iquery)
: Adds a new iQuery to the patterninherit_from(parent_pattern)
: Establishes inheritance relationshipvalidate_consistency()
: Checks for circular inheritance
The iQuery class (atlas/core/iquery.py) represents individual information requests. It encapsulates the logic for querying information sources and updating entity attributes.
Key features of iQuery:
- Target attribute to update
- List of resource handlers to use
- Conditions for execution
- Retry mechanism with exponential backoff
Notable methods:
execute(entity)
: Executes the iQuery, updating the entity's attributescheck_conditions(entity, global_state)
: Checks if the iQuery should be executedbuild_query(entity)
: Constructs the query based on the entity's stateprocess_response(response)
: Processes the response from resource handlers
The Repository class (atlas/data/repository.py) serves as an abstraction layer between the application logic and the database. It handles all database operations, including CRUD operations for entities, patterns, and iQueries.
Key features of Repository:
- Caching mechanism using TTLCache for improved performance
- Methods for creating, retrieving, updating, and deleting entities, patterns, and iQueries
- Batch operations for efficient data management
Notable methods:
get_entity_by_id(entity_id)
: Retrieves an entity, using cache when possiblecreate_entity(entity_id, attributes)
: Creates a new entity in the databaseupdate_entity_attributes(entity_id, new_attributes)
: Updates entity attributesbatch_create_entities(entities_data)
: Efficiently creates multiple entities
The data models (atlas/data/models.py) define the structure of the data stored in the Neo4j database. They use the neomodel library to define node types and relationships.
Key models:
EntityModel
: Represents entities in the databasePatternModel
: Represents patternsIQueryModel
: Represents iQueriesResourceHandlerModel
: Represents resource handlers
These models define the properties and relationships of each node type in the graph database, ensuring data integrity and enabling complex queries.
The system integrates with various external resources to gather information and update entities. This is primarily handled through different resource handler classes.
The OpenAIGPTHandler class (atlas/resources/openai_handler.py) manages interactions with OpenAI's GPT models. It handles API requests, response processing, and error handling.
Key features:
- Asynchronous API calls using aiohttp
- Retry mechanism with exponential backoff
- Response validation and processing
- Prompt construction based on entity attributes and iQuery parameters
The system includes handlers for external APIs (ExternalAPIHandler in atlas/resources/api_handler.py) and databases (AsyncPGDatabaseHandler in atlas/resources/database_handler.py). These handlers provide a consistent interface for different types of external resources.
The AsyncHumanInterfaceHandler class (atlas/resources/human_interface.py) simulates interactions with human operators, allowing for manual input when needed.
The system includes several utility modules to support its operations:
CircuitBreaker
(atlas/utils/circuitbreaker.py): Implements the circuit breaker pattern for handling failures in external service callsSettings
(atlas/utils/config.py): Manages configuration settings using Pydantic- Logger (atlas/utils/logger.py): Configures logging for the system
The system heavily utilizes Python's asyncio library for concurrent operations. This is evident in:
- ATLAS's global update cycle
- Entity's local update method
- iQuery execution
- Resource handler operations (especially API calls)
Asynchronous operations allow the system to handle multiple entities and queries efficiently, improving overall performance.
The system implements several strategies for error handling and resilience:
- Try-except blocks in critical sections
- Retry mechanism with exponential backoff in iQuery execution
- Circuit breaker pattern for handling external service failures
- Comprehensive logging throughout the system
The system uses Pydantic's BaseSettings for configuration management (atlas/utils/config.py). This allows for easy configuration through environment variables or .env files, with type checking and default values.
The manage_autopoiesis()
method in ATLAS implements the concept of autopoiesis, allowing the system to generate new entities based on existing knowledge. This process involves:
- Checking each entity for self-generation capabilities
- Executing self-generation methods on eligible entities
- Registering newly generated entities with ATLAS
The trigger_dynamic_refactor()
method in ATLAS enables the system to reorganize and update its knowledge structure. This process involves:
- Evaluating each entity to determine if refactoring is needed
- Executing refactoring operations on eligible entities
- Updating the graph structure based on refactoring results
The authority smoothing feature (smooth_authority()
method in ATLAS) implements a mechanism to balance the importance of entities in the graph. This process involves:
- Performing graph analysis using NetworkX to calculate authority scores
- Identifying entities with the lowest authority scores
- Boosting the authority of these entities through targeted updates
The general workflow of the system can be summarized as follows:
- ATLAS initialization
- Entity creation and registration
- Continuous global update cycles: a. Entity local updates b. iQuery execution c. Dynamic refactoring (when triggered) d. Autopoiesis management e. Authority smoothing (when triggered)
- Persistence of updated knowledge in the graph database
The system's design allows for scalability, but there are several considerations:
- The use of a graph database (Neo4j) allows for efficient querying of complex relationships
- Caching mechanisms (e.g., in the Repository class) help reduce database load
- Asynchronous operations allow for concurrent processing of multiple entities
- The singleton pattern for ATLAS could become a bottleneck in a distributed system
Future enhancements could include:
- Distributed processing of entities across multiple nodes
- Improved caching strategies
- Optimization of graph algorithms for large-scale knowledge bases
The current implementation includes basic security measures:
- Use of environment variables for sensitive information (e.g., API keys)
- Pydantic for type checking and validation of configuration settings
Additional security measures to consider:
- Encryption of sensitive data in the database
- Implementation of authentication and authorization for system access
- Regular security audits and vulnerability assessments
The provided code does not include explicit test cases. Implementing a comprehensive testing strategy would be crucial for ensuring system reliability and correctness. This could include:
- Unit tests for individual components (Entity, Pattern, iQuery, etc.)
- Integration tests for database operations and external API interactions
- System tests to validate the overall behavior of the PromptCrawling system
- Performance tests to ensure scalability and efficiency
Based on the paper's suggestions and the current implementation, potential future enhancements could include:
- Implementation of more sophisticated graph analysis algorithms
- Integration with a wider range of LLMs and external knowledge sources
- Development of a user interface for system monitoring and manual interventions
- Implementation of more advanced autopoiesis and self-organization strategies
- Exploration of federated learning techniques for distributed knowledge acquisition
- Integration of semantic reasoning capabilities to enhance knowledge inference
- Development of domain-specific Pattern and iQuery libraries for specialized applications
Contributions to PromptCrawling are welcome! Please follow these steps to contribute:
- Fork the repository
- Create a new branch for your feature
- Commit your changes
- Push to your fork
- Submit a pull request
Please ensure your code adheres to the project's coding standards and include appropriate tests.
This project is licensed under the MIT License - see the LICENSE file for details.
For more information on the theoretical background of this project, please refer to the original paper: "Prompt Crawling: Using Graph Automata to Enable Autopoiesis and Self-Generation in Knowledge Repositories" by R.J. Cordes.