Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler Integration #358

Open
wants to merge 302 commits into
base: main
Choose a base branch
from
Open

Crawler Integration #358

wants to merge 302 commits into from

Conversation

kamilczaja
Copy link
Collaborator

@kamilczaja kamilczaja commented Oct 24, 2024

Closes #347

richardtreier and others added 30 commits June 6, 2023 15:24
#98)

* feat: max data offers per connector

* feat: max contract offers per connector

* refactor: DataOfferFetcher

* refactor: DataOfferFetcher

* refactor: minor remarks

* feat: DataOfferLimitsEnforcer

* feat: DataOfferLimitsEnforcer

* feat: DataOfferLimitsEnforcer

* refactor: further refactorings

* refactor: further refactorings

* refactor: further refactorings

* test: add DataOfferLimitsEnforcerTest

* test: add DataOfferLimitsEnforcerTest

* test: add DataOfferLimitsEnforcerTest

* test: add DataOfferLimitsEnforcerTest

* test: add DataOfferLimitsEnforcerTest

* refactor: DataOfferLimitsEnforcer

* test: no_limit_and_two_dataofffers_and_contractoffer_should_not_limit
Bumps org.flywaydb.flyway from 9.19.1 to 9.19.3.

---
updated-dependencies:
- dependency-name: org.flywaydb.flyway
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.flywaydb.flyway from 9.19.3 to 9.19.4.

---
updated-dependencies:
- dependency-name: org.flywaydb.flyway
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: filtering of data offers

* chore: reformat code and organize imports

* chore: fix checkstyle
* feat: data offer pagination (and sorting fixes)

* chore: add stubs for new endpoints

* chore: test migrations so that non-empty tables will be migrated.
* feat: dataspace filter

* test: add dataspace filter

* chore: revert dataspace as asset prop

* feat: add dataSpace to CatalogQueryFields

* feat: add dataSpace to CatalogQueryFiaelds

* feat: add dataSpace to CatalogQueryFields

* feat: add dataSpace to CatalogQueryFields

* chore: fix checkstyle

* refactor: add DataSpaceConfig

* refactor: buildDataSpaceField

* refactor: BrokerServerSettings

* feat: get known dataspaces from config

* refactor: PR remarks

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor refactorings

* test: test_available_filter_values_to_filter_by

* test: test_available_filter_values_to_filter_by
Bumps org.flywaydb.flyway from 9.19.4 to 9.20.0.

---
updated-dependencies:
- dependency-name: org.flywaydb.flyway
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.jooq:jooq from 3.18.4 to 3.18.5.

---
updated-dependencies:
- dependency-name: org.jooq:jooq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore: fix api-wrapper integration

* test: refactor assertEqualJson
* feat: dataOfferDetailPage

* feat: connectorDetailPage

* feat: detail pages

* feat: detail pages

* feat: detail pages

* test: detail pages

* refactor: minor refactorings

* refactor: minor refactorings

* refactor: minor test refactorings

* refactor: minor test refactorings

* refactor: minor test refactorings

* refactor: minor test refactorings

* refactor: DataOfferDetailPageQueryService

* refactor: DataOfferDetailPageQueryService

* refactor: DataOfferDetailPageQueryService

* refactor: changed models slightly, improved tests

---------

Co-authored-by: Tim Berthold <[email protected]>
Co-authored-by: Tim Berthold <[email protected]>
* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* feat: permanently delete old offline connectors

* chore: minor refactorings

* chore: checkstyle

* test: DeadConnectorRemovalTest

* refactor: pr remarks

* chore: checkstyle

* chore: pr remarks

* chore: checkstyle

* refactor: test does not need full edc extension anymore

---------

Co-authored-by: Richard Treier <[email protected]>
* chore: add path mapping to reverse proxy deployment documentation

* chore: fix wording
* fix: api wrapper integration

* fix: api wrapper integration

* chore: checkstyle

* chore: checkstyle
kamilczaja and others added 28 commits November 18, 2024 13:37
* fix: Remove duplicate database indices
* fix: Improve documentation
---------

Co-authored-by: Kamil Czaja <[email protected]>
This reverts commit a02b631ea9a04a0e256ded2e4448e7ebf6effbdb.

#### Catalog Crawler Configuration

A productive configuration will require you to join a DAPS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For each dataspace environment you need one catalog crawler" (continue proper onboarding onto what this is and how to deal with it)

@@ -0,0 +1,25 @@
do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please check if this is still required

Comment on lines +1 to +31
# Default ENV Vars

# This file will be sourced as bash script:
# - KEY=Value will become KEY=${KEY:-"Value"}, so that ENV Vars can be overwritten by parent docker-compose.yaml.
# - Watch out for escaping issues as values will be surrounded by quotes, and dollar signs must be escaped.

# ===========================================================
# Available Catalog Crawler Config
# ===========================================================

# Environment ID
CRAWLER_ENVIRONMENT_ID=missing-env-CRAWLER_ENVIRONMENT_ID

# Fully Qualified Domain Name (e.g. example.com)
MY_EDC_FQDN=missing-env-MY_EDC_FQDN

# Postgres Database Connection
CRAWLER_DB_JDBC_URL=jdbc:postgresql://missing-postgresql-url
CRAWLER_DB_JDBC_USER=missing-postgresql-user
CRAWLER_DB_JDBC_PASSWORD=missing-postgresql-password

# Database Connection Pool Size
CRAWLER_DB_CONNECTION_POOL_SIZE=30

# Database Connection Timeout (in ms)
CRAWLER_DB_CONNECTION_TIMEOUT_IN_MS=30000

# CRON interval for crawling ONLINE connectors
CRAWLER_CRON_ONLINE_CONNECTOR_REFRESH=*/20 * * ? * *

# CRON interval for crawling OFFLINE connectors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a minute, don't we not need this anymore, since we have Config-as-Java-Code

| Caddy behind OAuth2 Proxy | caddy:2.7 |
| Authority Portal Backend | authority-portal-backend, see [CHANGELOG.md](../../../../CHANGELOG.md) for compatible versions. |
| Authority Portal Frontend | authority-portal-frontend, see [CHANGELOG.md](../../../../CHANGELOG.md) for compatible versions. |
| Catalog Crawler | authority-portal-crawler, see [CHANGELOG.md](../../../../CHANGELOG.md) for compatible versions. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog Crawler (one per environment)

Comment on lines +330 to +332
- The catalog crawler is meant to be served via TLS/HTTPS.
- The catalog crawler is meant to be deployed with a reverse proxy terminating TLS / providing HTTPS.
- All requests are meant to be redirected to the deployment's `11003` port.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Each catalog crawler..." requires rework. Maybe the AP and the crawlers should have their own sections

third point is a bit vague

registry-password: ${{ secrets.GITHUB_TOKEN }}
image-base-name: ${{ env.IMAGE_NAME_BASE }}
image-name: "authority-portal-crawler"
connector-name: "catalog-crawler-ce"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no dev image?

Comment on lines +15 to +16
- To prevent versioning conflicts with the image from EDC CE up to version 10.4.1, the image is now named differently. See [compatible versions](#compatible-versions) below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's enough that this is in the deployment migration notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Integration] Move crawler module from edc-ce Repo to AP