Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [benchmark][standalone] search raises error partition not loaded after partition loaded, and search raises error collection not loaded after partition released in multi-partition scene #37849

Open
1 task done
wangting0128 opened this issue Nov 20, 2024 · 10 comments
Assignees
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@wangting0128
Copy link
Contributor

wangting0128 commented Nov 20, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241119-484c6b5c-amd64
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc97
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-bitmap-scenes-h5n9j/
test case name: test_bitmap_locust_dml_partitions_standalone

server:

NAME                                                              READY   STATUS      RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-bitmap-scenes-h5n9j-6-etcd-0                              1/1     Running     0                4h32m   10.104.26.251   4am-node32   <none>           <none>
fouramf-bitmap-scenes-h5n9j-6-milvus-standalone-6db4bb95cdnq5lk   1/1     Running     2 (4h32m ago)    4h32m   10.104.32.22    4am-node39   <none>           <none>
fouramf-bitmap-scenes-h5n9j-6-minio-58bc6859db-6lxtr              1/1     Running     0                4h32m   10.104.18.190   4am-node25   <none>           <none>

search_failed.log

expect:
step 3: search not failed
step 6: search raises error partition not loaded

{pod=~"fouramf-bitmap-scenes-h5n9j-6-milvus-standalone-6db4bb95cdnq5lk"} |~ "e6d76ba73466916c6d9d58288528e6e8|scene_test_partition_MmORR0U8"
image

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`, divided into 10 partitions
            1. building `BITMAP` index on all supported 12 scalar fields
            2. 2 fields of different vector types
            3. load and search partial partitions & DML(upsert) requests

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'float_vector_1': 768dim
                'id': primary key type is INT64

                all scalar fields: varchar max_length=100, array max_capacity=13
            2. build indexes:
                IVF_SQ8: 'float_vector'
                HNSW: 'float_vector_1'

                BITMAP: all scalar fields
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - scene_insert_partition
                    (partition: create->insert->flush->release->drop)
                - scene_test_partition
                    (partition: create->insert->flush->index again->load->search->release->search failed->drop) <- search raises error after loading
                - scene_test_partition_hybrid_search
                    (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
                - release_partitions: 10 prepared partitions
                - upsert: batch=1

Milvus Log

No response

Anything else?

test result:

[2024-11-20 08:09:49,677 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-11-20 08:09:49,678 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: grpc     release_partitions                                                               154     0(0.00%) |    826      14    6493    160 |    0.01        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: grpc     scene_insert_partition                                                           152     0(0.00%) |  38668    3484  163410  29000 |    0.01        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: grpc     scene_test_partition                                                             154     2(1.30%) | 353780   67190  901133 341000 |    0.01        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: grpc     scene_test_partition_hybrid_search                                               132     0(0.00%) | 352065   97796  944646 327000 |    0.01        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: grpc     upsert                                                                           157     0(0.00%) |     80       3    1125      7 |    0.01        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]:          Aggregated                                                                       749     2(0.27%) | 142820       3  944646  20000 |    0.07        0.00 (stats.py:789)
[2024-11-20 08:09:49,678 -  INFO - fouram]:  (stats.py:790)
[2024-11-20 08:09:49,682 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0', 'memory': '16Gi'}, 'requests': {'cpu': '9.0', 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1, 'metrics': {'enabled': True, 'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone', 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus', 'tag': 'master-20241119-484c6b5c-amd64'}}},
            'host': 'fouramf-bitmap-scenes-h5n9j-6-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_bitmap_locust_dml_partitions_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'max_length': 100,
                                                    'scalars_index': {'int8_1': {'index_type': 'BITMAP'},
                                                                      'int16_1': {'index_type': 'BITMAP'},
                                                                      'int32_1': {'index_type': 'BITMAP'},
                                                                      'int64_1': {'index_type': 'BITMAP'},
                                                                      'varchar_1': {'index_type': 'BITMAP'},
                                                                      'bool_1': {'index_type': 'BITMAP'},
                                                                      'array_int8_1': {'index_type': 'BITMAP'},
                                                                      'array_int16_1': {'index_type': 'BITMAP'},
                                                                      'array_int32_1': {'index_type': 'BITMAP'},
                                                                      'array_int64_1': {'index_type': 'BITMAP'},
                                                                      'array_varchar_1': {'index_type': 'BITMAP'},
                                                                      'array_bool_1': {'index_type': 'BITMAP'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8, 'efConstruction': 200},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'array_int8_1': {'params': {'max_capacity': 13},
                                                                                        'other_params': {'dataset': 'random_algorithm',
                                                                                                         'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                              'specify_range': [-128, 128],
                                                                                                                              'max_capacity': 13}}},
                                                                       'array_int16_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                               'specify_range': [-200, 200],
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_int32_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'specify_scope',
                                                                                                                               'specify_range': [-300, 300],
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_int64_1': {'params': {'max_capacity': 13},
                                                                                         'other_params': {'dataset': 'random_algorithm',
                                                                                                          'algorithm_params': {'algorithm_name': 'fixed_value_range',
                                                                                                                               'specify_range': [-400, 432],
                                                                                                                               'batch': 50,
                                                                                                                               'max_capacity': 13}}},
                                                                       'array_varchar_1': {'params': {'max_capacity': 13},
                                                                                           'other_params': {'dataset': 'random_algorithm',
                                                                                                            'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                                 'specify_range': [-1500, 1500],
                                                                                                                                 'max_capacity': 13}}},
                                                                       'array_bool_1': {'params': {'max_capacity': 13}},
                                                                       'int8_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                   'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                        'specify_range': [-128, 128],
                                                                                                                        'max_capacity': 13}}},
                                                                       'int16_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                         'specify_range': [-200, 200],
                                                                                                                         'max_capacity': 13}}},
                                                                       'int32_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'specify_scope',
                                                                                                                         'specify_range': [-300, 300],
                                                                                                                         'max_capacity': 13}}},
                                                                       'int64_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                    'algorithm_params': {'algorithm_name': 'fixed_value_range',
                                                                                                                         'specify_range': [-400, 432],
                                                                                                                         'batch': 50,
                                                                                                                         'max_capacity': 13}}},
                                                                       'varchar_1': {'other_params': {'dataset': 'random_algorithm',
                                                                                                      'algorithm_params': {'algorithm_name': 'random_range',
                                                                                                                           'specify_range': [-1500, 1500],
                                                                                                                           'max_capacity': 13}}}},
                                                    'extra_partitions': {'partitions': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                        'partition_5', 'partition_6', 'partition_7', 'partition_8',
                                                                                        'partition_9'],
                                                                         'data_repeated': False},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 5000},
                                 'collection_params': {'other_fields': ['float_vector_1', 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'varchar_1', 'bool_1',
                                                                        'array_int8_1', 'array_int16_1', 'array_int32_1', 'array_int64_1', 'array_varchar_1',
                                                                        'array_bool_1'],
                                                       'shards_num': 16},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False, 'reset_db': False},
                                 'index_params': {'index_type': 'IVF_SQ8', 'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 10, 'during_time': '3h', 'interval': 20, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'scene_insert_partition',
                                                       'weight': 1,
                                                       'params': {'data_size': 3000, 'ni': 1000, 'with_flush': True, 'timeout': 600}},
                                                      {'type': 'scene_test_partition',
                                                       'weight': 1,
                                                       'params': {'data_size': 3000,
                                                                  'ni': 3000,
                                                                  'nq': 1,
                                                                  'search_param': {'nprobe': 64},
                                                                  'limit': 1,
                                                                  'expr': None,
                                                                  'output_fields': ['*'],
                                                                  'guarantee_timestamp': None,
                                                                  'timeout': 600,
                                                                  'search_counts': 1}},
                                                      {'type': 'scene_test_partition_hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 1,
                                                                  'reqs': [{'search_param': {'nprobe': 128}, 'anns_field': 'float_vector', 'top_k': 100},
                                                                           {'search_param': {'ef': 64}, 'anns_field': 'float_vector_1', 'top_k': 10}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'hybrid_search_counts': 1,
                                                                  'data_size': 3000,
                                                                  'ni': 3000}},
                                                      {'type': 'release_partitions',
                                                       'weight': 1,
                                                       'params': {'partitions': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
                                                                                 'partition_5', 'partition_6', 'partition_7', 'partition_8', 'partition_9'],
                                                                  'timeout': 180,
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}},
                                                      {'type': 'upsert',
                                                       'weight': 1,
                                                       'params': {'nb': 1,
                                                                  'timeout': 30,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 5000000,
                                                                  'shuffle_id': False,
                                                                  'check_task': 'check_response',
                                                                  'check_items': None}}]},
            'run_id': 2024112038729153,
            'datetime': '2024-11-20 03:37:52.649223',
            'client_version': '2.5.0'},
 'result': {'test_result': {'index': {'RT': 4657.8782,
                                      'float_vector_1': {'RT': 2.0357},
                                      'int8_1': {'RT': 1.0261},
                                      'int16_1': {'RT': 1.0244},
                                      'int32_1': {'RT': 0.5166},
                                      'int64_1': {'RT': 0.517},
                                      'varchar_1': {'RT': 0.5165},
                                      'bool_1': {'RT': 0.5179},
                                      'array_int8_1': {'RT': 0.519},
                                      'array_int16_1': {'RT': 0.5186},
                                      'array_int32_1': {'RT': 0.5279},
                                      'array_int64_1': {'RT': 0.5192},
                                      'array_varchar_1': {'RT': 0.5195},
                                      'array_bool_1': {'RT': 0.5183}},
                            'insert': {'total_time': 483.3555, 'VPS': 10345.0365, 'batch_time': 0.4834, 'batch': 5000.0},
                            'flush': {'RT': 8.8584},
                            'load': {'RT': 8.357},
                            'Locust': {'Aggregated': {'Requests': 749,
                                                      'Fails': 2,
                                                      'RPS': 0.07,
                                                      'fail_s': 0.0,
                                                      'RT_max': 944646.39,
                                                      'RT_avg': 142820.38,
                                                      'TP50': 20000.0,
                                                      'TP99': 773000.0},
                                       'release_partitions': {'Requests': 154,
                                                              'Fails': 0,
                                                              'RPS': 0.01,
                                                              'fail_s': 0.0,
                                                              'RT_max': 6493.62,
                                                              'RT_avg': 826.1,
                                                              'TP50': 170.0,
                                                              'TP99': 6300.0},
                                       'scene_insert_partition': {'Requests': 152,
                                                                  'Fails': 0,
                                                                  'RPS': 0.01,
                                                                  'fail_s': 0.0,
                                                                  'RT_max': 163410.61,
                                                                  'RT_avg': 38668.6,
                                                                  'TP50': 29000.0,
                                                                  'TP99': 148000.0},
                                       'scene_test_partition': {'Requests': 154,
                                                                'Fails': 2,
                                                                'RPS': 0.01,
                                                                'fail_s': 0.01,
                                                                'RT_max': 901133.62,
                                                                'RT_avg': 353780.9,
                                                                'TP50': 341000.0,
                                                                'TP99': 876000.0},
                                       'scene_test_partition_hybrid_search': {'Requests': 132,
                                                                              'Fails': 0,
                                                                              'RPS': 0.01,
                                                                              'fail_s': 0.0,
                                                                              'RT_max': 944646.39,
                                                                              'RT_avg': 352065.85,
                                                                              'TP50': 328000.0,
                                                                              'TP99': 938000.0},
                                       'upsert': {'Requests': 157,
                                                  'Fails': 0,
                                                  'RPS': 0.01,
                                                  'fail_s': 0.0,
                                                  'RT_max': 1125.13,
                                                  'RT_avg': 80.68,
                                                  'TP50': 7,
                                                  'TP99': 1000.0}}}}}
@wangting0128 wangting0128 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. test/benchmark benchmark test labels Nov 20, 2024
@wangting0128 wangting0128 added this to the 2.5.0 milestone Nov 20, 2024
@yanliang567
Copy link
Contributor

/assign @czs007
/unassign

@sre-ci-robot sre-ci-robot assigned czs007 and unassigned yanliang567 Nov 21, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 21, 2024
@czs007
Copy link
Collaborator

czs007 commented Nov 21, 2024

image

It is currently suspected that the collection object does not exist when QN try to load the partition, rendering the call to add partitions ineffective.

@czs007
Copy link
Collaborator

czs007 commented Nov 21, 2024

image

The reference count of the collection on QN is constantly fluctuating, during which there is a period when the collection does not exist.

@xiaofan-luan
Copy link
Collaborator

The problem here is not proxy cache.

it's querycoord trying to add extra check on GetShardLeadersWithChannels.

To me, this is check is fully unnecessary.

Querycoord should just return shard delegator's location to proxy and proxy should send request to the right delegator and let delegator itself should be responsible to hold request if they are not ready.

@weiliu1031 @congqixia @tedxu

Try to think in a more succinct way of who should be responsible of doing what is very important. Right now we add too many temporary logic to fix corner cases and make the system very hard to understand.

@xiaofan-luan
Copy link
Collaborator

image

It is currently suspected that the collection object does not exist when QN try to load the partition, rendering the call to add partitions ineffective.

From the error message, it seems that this is not related to querynode but more like a issue of querycoord GetShardLeaders function.

@wangting0128
Copy link
Contributor Author

same root cause,different error

argo task: multi-vector-corn-1-1732197600
test case name: test_hybrid_search_locust_dql_dml_partition_cluster
image: master-20241121-1dc1a97e-amd64

server:

NAME                                                              READY   STATUS             RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1732197600-1-etcd-0                           1/1     Running            0               3h11m   10.104.19.53    4am-node28   <none>           <none>
multi-vector-corn-1-1732197600-1-etcd-1                           1/1     Running            0               3h11m   10.104.16.10    4am-node21   <none>           <none>
multi-vector-corn-1-1732197600-1-etcd-2                           1/1     Running            0               3h11m   10.104.23.143   4am-node27   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-datanode-67b4d4c94qj682   1/1     Running            3 (3h6m ago)    3h11m   10.104.16.243   4am-node21   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-765794b4bw9bn   1/1     Running            3 (3h10m ago)   3h11m   10.104.21.26    4am-node24   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-765794b4dvrtq   1/1     Running            3 (3h10m ago)   3h11m   10.104.20.59    4am-node22   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-765794b4vmrbh   1/1     Running            2 (3h10m ago)   3h11m   10.104.17.95    4am-node23   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-765794b4x5ddf   1/1     Running            2 (3h10m ago)   3h11m   10.104.34.31    4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-mixcoord-7f44b49b9ddwpm   1/1     Running            3 (3h6m ago)    3h11m   10.104.24.75    4am-node29   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-proxy-7f4fd9c8c6-ck9dr    1/1     Running            3 (3h5m ago)    3h11m   10.104.34.30    4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-querynode-64c678d8272k7   1/1     Running            2 (3h10m ago)   3h11m   10.104.32.214   4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-0                          1/1     Running            0               3h11m   10.104.20.67    4am-node22   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-1                          1/1     Running            0               3h11m   10.104.19.56    4am-node28   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-2                          1/1     Running            0               3h11m   10.104.23.142   4am-node27   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-3                          1/1     Running            0               3h11m   10.104.16.16    4am-node21   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-0                  1/1     Running            0               3h11m   10.104.19.54    4am-node28   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-1                  1/1     Running            0               3h11m   10.104.30.13    4am-node38   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-2                  1/1     Running            0               3h11m   10.104.15.92    4am-node20   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-init-2qtr7         0/1     Completed          0               3h11m   10.104.24.73    4am-node29   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-broker-0                  1/1     Running            0               3h11m   10.104.24.74    4am-node29   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-proxy-0                   1/1     Running            0               3h11m   10.104.5.233    4am-node12   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-pulsar-init-78v9w         0/1     Completed          0               3h11m   10.104.20.58    4am-node22   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-recovery-0                1/1     Running            0               3h11m   10.104.30.249   4am-node38   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-0               1/1     Running            0               3h11m   10.104.16.13    4am-node21   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-1               1/1     Running            0               3h9m    10.104.23.154   4am-node27   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-2               1/1     Running            0               3h8m    10.104.15.105   4am-node20   <none>           <none>

client log:

[2024-11-21 14:27:10,063 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 98317464-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,063 - ERROR - fouram]: (api_response) : [Partition.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: af766562-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,141 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: af4ec624-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,142 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 9831473c-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,143 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: a84a26c0-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,143 - ERROR - fouram]: (api_response) : [Partition.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: a59673ac-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 14:27:10,144 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: a58438cc-a814-11ef-87d2-ce2f82da1cbe] (api_request.py:57)


...



[2024-11-21 16:55:02,536 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 55a6ae2e-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,537 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 501c8686-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,703 - ERROR - fouram]: (api_response) : [Partition.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 5966baea-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,704 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 501d7712-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,705 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 5893aace-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,705 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 501caaf8-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,706 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 501c30c8-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:02,706 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 501c6246-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 16:55:03,329 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454085355925406400v0])>, [requestId: 59cc9450-a829-11ef-87d2-ce2f82da1cbe] (api_request.py:57)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DQL & DML(partition)`
            verify concurrent DQL & DML(partition) scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data into 10 partitions
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test_partition
                    (partition: create->insert->flush->index again->load->search->release->search failed->drop)
                - search
                - hybrid_search
                - query

@wangting0128
Copy link
Contributor Author

same root cause,different error

argo task:multi-vector-corn-1-1732197600
test case name: test_hybrid_search_locust_dql_dml_partition_hybrid_search_cluster
image: master-20241121-19572f5b-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1732197600-1-etcd-0                           1/1     Running     0               3h9m    10.104.34.128   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-etcd-1                           1/1     Running     0               3h9m    10.104.18.46    4am-node25   <none>           <none>
multi-vector-corn-1-1732197600-1-etcd-2                           1/1     Running     0               3h9m    10.104.21.109   4am-node24   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-datanode-bfc55cfb8ltprk   1/1     Running     2 (3h5m ago)    3h9m    10.104.32.29    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-7d8c894cdl6sl   1/1     Running     2 (3h9m ago)    3h9m    10.104.23.131   4am-node27   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-7d8c894cmzghd   1/1     Running     1 (3h9m ago)    3h9m    10.104.20.103   4am-node22   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-7d8c894cqw224   1/1     Running     2 (3h9m ago)    3h9m    10.104.6.238    4am-node13   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-indexnode-7d8c894crm79j   1/1     Running     1 (3h9m ago)    3h9m    10.104.34.122   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-mixcoord-ff7fdb786s5td6   1/1     Running     2 (3h5m ago)    3h9m    10.104.34.121   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-proxy-6d65ddb7d-5pfmc     1/1     Running     2 (3h5m ago)    3h9m    10.104.32.28    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-querynode-547d5d5d9ld78   1/1     Running     1 (3h9m ago)    3h9m    10.104.32.30    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-milvus-querynode-547d5d5dx84sd   1/1     Running     1 (3h9m ago)    3h9m    10.104.17.114   4am-node23   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-0                          1/1     Running     0               3h9m    10.104.20.105   4am-node22   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-1                          1/1     Running     0               3h9m    10.104.34.127   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-2                          1/1     Running     0               3h9m    10.104.18.45    4am-node25   <none>           <none>
multi-vector-corn-1-1732197600-1-minio-3                          1/1     Running     0               3h9m    10.104.21.108   4am-node24   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-0                  1/1     Running     0               3h9m    10.104.34.129   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-1                  1/1     Running     0               3h9m    10.104.18.49    4am-node25   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-2                  1/1     Running     0               3h9m    10.104.21.112   4am-node24   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-bookie-init-4l2h5         0/1     Completed   0               3h9m    10.104.32.23    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-broker-0                  1/1     Running     0               3h9m    10.104.32.25    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-proxy-0                   1/1     Running     0               3h9m    10.104.32.24    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-pulsar-init-79zmt         0/1     Completed   0               3h9m    10.104.32.26    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-recovery-0                1/1     Running     0               3h9m    10.104.32.27    4am-node39   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-0               1/1     Running     0               3h9m    10.104.17.116   4am-node23   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-1               1/1     Running     0               3h8m    10.104.34.131   4am-node37   <none>           <none>
multi-vector-corn-1-1732197600-1-pulsar-zookeeper-2               1/1     Running     0               3h8m    10.104.18.51    4am-node25   <none>           <none>

client log:

[2024-11-21 17:31:00,747 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=200, message=partition not found[partition=scene_test_partition_hybrid_search_c0wfDHD6])>, [requestId: 54a39bcc-a82e-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 17:42:08,624 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to search on QueryNode 2: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_1_454088378512573055v1])>, [requestId: ea678e24-a82f-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 17:42:08,628 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to search on QueryNode 2: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_1_454088378512573055v1])>, [requestId: eab2a99a-a82f-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 17:42:08,629 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to Query on QueryNode 2: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_1_454088378512573055v1])>, [requestId: e06ebd8e-a82f-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 17:42:08,629 - ERROR - fouram]: (api_response) : [Collection.search

....

[2024-11-21 20:24:31,004 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454088378512573055v0])>, [requestId: 9cc2c884-a846-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 20:24:31,005 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454088378512573055v0])>, [requestId: 9cb8c3de-a846-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 20:24:31,098 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454088378512573055v0])>, [requestId: 9cc46e3c-a846-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 20:24:31,100 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454088378512573055v0])>, [requestId: 9cceb45a-a846-11ef-87d2-ce2f82da1cbe] (api_request.py:57)
[2024-11-21 20:25:01,170 - ERROR - fouram]: (api_response) : [Collection.query] <MilvusException: (code=503, message=fail to Query on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_0_454088378512573055v0])>, [requestId: 9c39ce4e-a846-11ef-87d2-ce2f82da1cbe] (api_request.py:57)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DQL & DML(partition)`
            verify concurrent DQL & DML(partition) scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data into 10 partitions
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test_partition_hybrid_search
                    (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
                - search
                - hybrid_search
                - query

@wangting0128
Copy link
Contributor Author

@czs007 @weiliu1031 please help check ~

@wangting0128
Copy link
Contributor Author

Maybe the root cause is the same

argo task: fouramf-bitmap-scenes-5xlvn
test case name: test_bitmap_locust_hybrid_index_cluster
image: master-20241122-cfa1f1f1-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-bitmap-scenes-5xlvn-10-etcd-0                             1/1     Running     0               4h15m   10.104.15.43    4am-node20   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-etcd-1                             1/1     Running     0               4h15m   10.104.18.176   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-etcd-2                             1/1     Running     0               4h15m   10.104.24.76    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-datanode-54978bc8-lwr5m     1/1     Running     3 (4h10m ago)   4h15m   10.104.33.198   4am-node36   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-indexnode-557b77dcdf64pxg   1/1     Running     3 (4h14m ago)   4h15m   10.104.20.203   4am-node22   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-indexnode-557b77dcdfj2wfw   1/1     Running     3 (4h15m ago)   4h15m   10.104.6.103    4am-node13   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-mixcoord-584c995b78-9vxnw   1/1     Running     4 (4h10m ago)   4h15m   10.104.6.102    4am-node13   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-proxy-f7b589886-nvx9q       1/1     Running     4 (4h10m ago)   4h15m   10.104.20.202   4am-node22   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-milvus-querynode-84b56dd4f794m2f   1/1     Running     3 (4h15m ago)   4h15m   10.104.19.169   4am-node28   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-minio-0                            1/1     Running     0               4h15m   10.104.34.59    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-minio-1                            1/1     Running     0               4h15m   10.104.15.42    4am-node20   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-minio-2                            1/1     Running     0               4h15m   10.104.18.179   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-minio-3                            1/1     Running     0               4h15m   10.104.24.75    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-bookie-0                    1/1     Running     0               4h15m   10.104.24.72    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-bookie-1                    1/1     Running     0               4h15m   10.104.34.60    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-bookie-2                    1/1     Running     0               4h15m   10.104.18.180   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-bookie-init-mnvzw           0/1     Completed   0               4h15m   10.104.24.64    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-broker-0                    1/1     Running     0               4h15m   10.104.14.150   4am-node18   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-proxy-0                     1/1     Running     0               4h15m   10.104.5.79     4am-node12   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-pulsar-init-4vrtk           0/1     Completed   0               4h15m   10.104.34.52    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-recovery-0                  1/1     Running     0               4h15m   10.104.24.62    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-zookeeper-0                 1/1     Running     0               4h15m   10.104.34.58    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-zookeeper-1                 1/1     Running     0               4h15m   10.104.18.200   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-10-pulsar-zookeeper-2                 1/1     Running     0               4h13m   10.104.30.222   4am-node38   <none>           <none>

client log:

[2024-11-22 04:18:24,630 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 2: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_1_454097603667689713v1])>, <Time:{'RPC start': '2024-11-22 04:18:23.980651', 'RPC error': '2024-11-22 04:18:24.630272'}> (decorators.py:140)
[2024-11-22 04:18:24,631 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=fail to search on QueryNode 2: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_1_454097603667689713v1])>, <Time:{'RPC start': '2024-11-22 04:18:23.979177', 'RPC error': '2024-11-22 04:18:24.631638'}> (decorators.py:140)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`
            1. building default index on all supported 16 scalar fields
            2. load and search partial partitions & DQL requests

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'id': primary key type is INT64

                all scalar fields: varchar max_length=100, array max_capacity=9
            2. build indexes:
                IVF_SQ8: 'float_vector'

                default scalar index: all scalar fields
            3. insert 6 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - scene_test_partition
                    (partition: create->insert->flush->index again->load->search->release->search failed->drop)
                - scene_test_partition_hybrid_search
                    (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
                - search
                - query
                - hybrid_search

@wangting0128
Copy link
Contributor Author

reproduce

argo task: fouramf-bitmap-scenes-5xlvn
test case name: test_bitmap_locust_dql_dml_partitions_cluster
image: master-20241122-cfa1f1f1-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-bitmap-scenes-5xlvn-5-etcd-0                              1/1     Running     0               3h55m   10.104.34.70    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-etcd-1                              1/1     Running     0               3h55m   10.104.24.87    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-etcd-2                              1/1     Running     0               3h55m   10.104.18.195   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-datanode-6d5cf794cf-hqd26    1/1     Running     5 (3h48m ago)   3h55m   10.104.15.44    4am-node20   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-indexnode-5b5b8657bb-6cxv8   1/1     Running     4 (3h53m ago)   3h55m   10.104.14.152   4am-node18   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-indexnode-5b5b8657bb-6rrzf   1/1     Running     4 (3h53m ago)   3h55m   10.104.9.76     4am-node14   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-mixcoord-58dfbf854f-mvlbk    1/1     Running     5 (3h48m ago)   3h55m   10.104.17.139   4am-node23   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-proxy-5654677875-xpzzf       1/1     Running     4 (3h49m ago)   3h55m   10.104.15.45    4am-node20   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-querynode-7fd644bd6d-4xlzg   1/1     Running     4 (3h53m ago)   3h55m   10.104.17.140   4am-node23   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-milvus-querynode-7fd644bd6d-b6wlb   1/1     Running     3 (3h53m ago)   3h55m   10.104.5.82     4am-node12   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-minio-0                             1/1     Running     0               3h55m   10.104.34.68    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-minio-1                             1/1     Running     0               3h55m   10.104.24.80    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-minio-2                             1/1     Running     0               3h55m   10.104.18.197   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-minio-3                             1/1     Running     0               3h55m   10.104.30.193   4am-node38   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-bookie-0                     1/1     Running     0               3h55m   10.104.34.67    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-bookie-1                     1/1     Running     0               3h55m   10.104.24.86    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-bookie-2                     1/1     Running     0               3h55m   10.104.21.86    4am-node24   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-bookie-init-xbw9p            0/1     Completed   0               3h55m   10.104.14.153   4am-node18   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-broker-0                     1/1     Running     0               3h55m   10.104.13.191   4am-node16   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-proxy-0                      1/1     Running     0               3h55m   10.104.24.73    4am-node29   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-pulsar-init-ct5jj            0/1     Completed   0               3h55m   10.104.14.151   4am-node18   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-recovery-0                   1/1     Running     0               3h55m   10.104.18.178   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-zookeeper-0                  1/1     Running     0               3h55m   10.104.34.66    4am-node37   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-zookeeper-1                  1/1     Running     0               3h53m   10.104.18.231   4am-node25   <none>           <none>
fouramf-bitmap-scenes-5xlvn-5-pulsar-zookeeper-2                  1/1     Running     0               3h50m   10.104.30.227   4am-node38   <none>           <none>

client log:

[2024-11-22 04:00:05,623 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_8_454097627099693408v8])>, <Time:{'RPC start': '2024-11-22 04:00:04.957104', 'RPC error': '2024-11-22 04:00:05.623364'}> (decorators.py:140)
[2024-11-22 04:00:05,790 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=fail to search on QueryNode 1: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_8_454097627099693408v8])>, <Time:{'RPC start': '2024-11-22 04:00:05.139241', 'RPC error': '2024-11-22 04:00:05.790036'}> (decorators.py:140)
[2024-11-22 04:36:43,922 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_15_454097627099693408v15])>, <Time:{'RPC start': '2024-11-22 04:36:43.256321', 'RPC error': '2024-11-22 04:36:43.922639'}> (decorators.py:140)
[2024-11-22 04:36:44,823 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_15_454097627099693408v15])>, <Time:{'RPC start': '2024-11-22 04:36:43.292489', 'RPC error': '2024-11-22 04:36:44.822995'}> (decorators.py:140)
[2024-11-22 05:29:46,318 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_14_454097627099693408v14])>, <Time:{'RPC start': '2024-11-22 05:29:45.680949', 'RPC error': '2024-11-22 05:29:46.318295'}> (decorators.py:140)
[2024-11-22 05:29:46,566 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_14_454097627099693408v14])>, <Time:{'RPC start': '2024-11-22 05:29:45.413446', 'RPC error': '2024-11-22 05:29:46.566535'}> (decorators.py:140)
[2024-11-22 05:29:46,572 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_14_454097627099693408v14])>, <Time:{'RPC start': '2024-11-22 05:29:45.415872', 'RPC error': '2024-11-22 05:29:46.572215'}> (decorators.py:140)
[2024-11-22 05:29:46,583 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=200, message=partition not found[partition=scene_insert_partition_WpY6FOi7])>, <Time:{'RPC start': '2024-11-22 05:29:46.410808', 'RPC error': '2024-11-22 05:29:46.583313'}> (decorators.py:140)
[2024-11-22 05:46:08,515 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:07.798253', 'RPC error': '2024-11-22 05:46:08.515813'}> (decorators.py:140)
[2024-11-22 05:46:08,517 - ERROR - fouram]: RPC error: [hybrid_search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:07.801657', 'RPC error': '2024-11-22 05:46:08.517403'}> (decorators.py:140)
[2024-11-22 05:46:09,036 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:07.981093', 'RPC error': '2024-11-22 05:46:09.036426'}> (decorators.py:140)
[2024-11-22 05:46:09,038 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=fail to search on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:08.174129', 'RPC error': '2024-11-22 05:46:09.038195'}> (decorators.py:140)
[2024-11-22 05:46:09,072 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:07.222275', 'RPC error': '2024-11-22 05:46:09.072187'}> (decorators.py:140)
[2024-11-22 05:46:09,077 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=503, message=fail to Query on QueryNode 3: distribution is not servcieable: channel not available[channel=by-dev-rootcoord-dml_11_454097627099693408v11])>, <Time:{'RPC start': '2024-11-22 05:46:07.799273', 'RPC error': '2024-11-22 05:46:09.077279'}> (decorators.py:140)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`, divided into 10 partitions
            1. building `BITMAP` index on all supported 12 scalar fields
            2. 2 fields of different vector types
            3. load and search partial partitions & DQL requests

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'float_vector_1': 768dim
                'id': primary key type is INT64

                all scalar fields: varchar max_length=100, array max_capacity=13
            2. build indexes:
                IVF_SQ8: 'float_vector'
                HNSW: 'float_vector_1'

                BITMAP: all scalar fields
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - scene_insert_partition
                    (partition: create->insert->flush->release->drop)
                - scene_test_partition
                    (partition: create->insert->flush->index again->load->search->release->search failed->drop)
                - scene_test_partition_hybrid_search
                    (partition: create->insert->flush->index again->load->hybrid_search->release->hybrid_search failed->drop)
                - search
                - query
                - hybrid_search

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug test/benchmark benchmark test triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants