Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior with diagonal gates. #73

Open
sss441803 opened this issue Jul 25, 2023 · 15 comments
Open

Strange behavior with diagonal gates. #73

sss441803 opened this issue Jul 25, 2023 · 15 comments
Assignees

Comments

@sss441803
Copy link

Hi,

I constructed a simple 6-qubit circuit with a brickwork pattern, and generate the corresponding expression for calculating an amplitude. The alternating two-qubit gates are generally not diagonal, but some can be decomposed into local single-qubit gates followed by a diagonal gate. Diagonal decomposition, with the introduction of hyperedges, should reduce the computational cost. Although the example provided below should not result in a reduction of cost that is very noticeable, we are still seeing some very unexpected behaviors. The contraction path results in a very large number of open indices in intermediate tensors, even though the graph should be almost a ring graph, which has a tree width of 2.

Code:

import numpy as np
from cuquantum import contract_path

# Filler for operands, values don't matter
value = np.zeros(2, dtype=complex)
cz = np.zeros([2, 2], dtype=complex)
single_qubit_gate = np.zeros([2, 2], dtype=complex)
two_qubit_gate = np.zeros([2, 2, 2, 2], dtype=complex)

# Regular no diagonal decomposition
exp_str = 'a,b,c,d,e,f,ag,bh,ci,dj,ek,fl,ghmn,ijop,klqr,mrsx,notu,pqvw,sy,tz,uA,vB,wC,xD,y,z,A,B,C,D->'
operands = [value] * 6 + [single_qubit_gate] * 6 + [two_qubit_gate] * 6 + [single_qubit_gate] * 6 + [value] * 6
path, info = contract_path(exp_str, *operands)
cost = info.opt_cost
largest_intermediate = info.largest_intermediate
intermediate_modes = info.intermediate_modes
print(f'No diagonal gate: cost {cost}, largest_intermediate {largest_intermediate}.')
print('Intermediate modes: ', intermediate_modes)

# Diagonal decomposition
exp_str = 'a,b,c,d,e,f,ag,bh,ci,dj,ek,fl,gh,ij,kl,gm,hn,io,jp,kq,lr,mr,no,pq,ms,nt,ou,pv,qw,rx,s,t,u,v,w,x'
operands = [value] * 6 + [single_qubit_gate] * 6 + [cz] * 3 + [single_qubit_gate] * 6 + [cz] * 3 + [single_qubit_gate] * 6 + [value] * 6
path, info = contract_path(exp_str, *operands)
cost = info.opt_cost
largest_intermediate = info.largest_intermediate
intermediate_modes = info.intermediate_modes
print(f'Diagonal gate: cost {cost}, largest_intermediate {largest_intermediate}.')
print('Intermediate modes: ', intermediate_modes)

Output:

No diagonal gate: cost 724.0, largest_intermediate 16.0.
Intermediate modes: ('jopc', 'jop', 'g', 'hmn', 'h', 'j', 'elqr', 'l', 'ymrx', 'znou', 'u', 'Bpqw', 'w', 'x', 'lqr', 'qr', 'pqw', 'pq', 'op', 'mn', 'nou', 'no', 'pn', 'mp', 'qm', 'rm', 'yx', 'x', '')
Diagonal gate: cost 5072.0, largest_intermediate 256.0.
Intermediate modes: ('i', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 'ji', 'hg', 'nh', 'pj', 'lk', 'rl', 'gm', 'on', 'io', 'qp', 'kq', 'mr', 'jimr', 'hglk', 'gmon', 'rlqp', 'nhjimr', 'pjhglk', 'kqgmon', 'iorlqp', 'nimrpglk', 'kgmnirlp', '')

@mtjrider mtjrider self-assigned this Jul 25, 2023
@haidarazzam
Copy link
Collaborator

Thank you for submitting your issue.
We have identified the problem, the current solution is by disabling cutensorNet internal preprocessing simplification process 'CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SIMPLIFICATION_DISABLE_DR'
then you would get what you are expecting 312 Flops for the diagonal decomposition.
More details will come later.

@sss441803
Copy link
Author

sss441803 commented Aug 10, 2023

Hi, I am also trying to understand the quality of contraction order for cuquantum and cotengra, and I am trying to use cotengra contraction order in cuquantum contraction for fair comparison.

from cuquantum import Network
import cuquantum.cutensornet as cutn
import cotengra as ctg
opt = ctg.HyperOptimizer(
        slicing_reconf_opts={'target_size': 2**30}, # target size 2 ** 20 still causes the same kind of error
        parallel=32,
        progbar=False,
        minimize='flops'
    )
tree = opt.search(inputs, output, size_dict)
path = tree.get_path()
network = Network(expression, *operands)
network.optimizer_config_ptr = cutn.create_contraction_optimizer_config(network.handle)
network._set_opt_config_option('SIMPLIFICATION_DISABLE_DR', cutn.ContractionOptimizerConfigAttribute.SIMPLIFICATION_DISABLE_DR, 1)
optimizer_options = configuration.OptimizerOptions(samples=1, path=list(path))
path, info = network.contract_path(optimize=optimizer_options)

This produces the correct result when the network is not too large. However, for large networks, I get the following output

Traceback (most recent call last):
  File "optimize.py", line 41, in <module>
    path, info = network.contract_path(optimize=optimizer_options)
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/_internal/utils.py", line 474, in inner
    result = wrapped_function(*args, **kwargs)
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/_internal/utils.py", line 443, in inner
    raise e
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/_internal/utils.py", line 435, in inner
    result = wrapped_function(*args, **kwargs)
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/tensor_network.py", line 568, in contract_path
    self._calculate_workspace_size()
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/_internal/utils.py", line 474, in inner
    result = wrapped_function(*args, **kwargs)
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/_internal/utils.py", line 474, in inner
    result = wrapped_function(*args, **kwargs)
  File "/home/minzhaoliu/.conda/envs/cuquantum/lib/python3.8/site-packages/cuquantum/cutensornet/tensor_network.py", line 384, in _calculate_workspace_size
    cutn.workspace_compute_contraction_sizes(self.handle, self.network, self.optimizer_info_ptr, self.workspace_desc)
  File "cuquantum/cutensornet/cutensornet.pyx", line 679, in cuquantum.cutensornet.cutensornet.workspace_compute_contraction_sizes
  File "cuquantum/cutensornet/cutensornet.pyx", line 696, in cuquantum.cutensornet.cutensornet.workspace_compute_contraction_sizes
  File "cuquantum/cutensornet/cutensornet.pyx", line 240, in cuquantum.cutensornet.cutensornet.check_status
cuquantum.cutensornet.cutensornet.cuTensorNetError: CUTENSORNET_STATUS_NOT_SUPPORTED

The variables input to cotengra can be converted from expressions and operators in the cuquantum format using the following code:

inputs = []
tensors = expression.split(',')
tensors[-1] = tensors[-1][:-2]
for tensor in tensors:
    inputs.append(tuple([char for char in tensor]))
output = ()
size_dict = {}
for i in range(offset, ord(max(expression)) + 1):
    size_dict[chr(i)] = 2

The expression and operands are as follows (from sycamore m20):

expression = 'ĩĪīĬ,ĭĮįİ,ıIJijĴ,ĵĶķĸ,ĹĺĻļ,ĽľĿŀ,ŁłĉĊ,ċČčĎ,ďĐđĒ,ēĔĕĖ,ėĘęĚ,ěĜĝĞ,ğĠĩġ,ĢƈƉƊ,ƋĬƌƍ,ƎƏĭƐ,Ƒ݃Ɠ,Ɣƒƕ,ıƖƗ,Ƙĵƙƚ,ƛĹƜƝ,ƞĽƟƠ,ơċƢƣ,ƤďƥƦ,ƧƨƩƪ,ƫěƬƭ,ƮĴŁƯ,ưĸēƱ,ƲļėƳ,ƴŀğƵ,ƶĊĢƷ,ƸĎƹƺ,ƻĒƋƼ,ƽĖƎƾ,ƿƨǀǁ,ǂġƑǃ,DŽƊDždž,LJƹLjlj,NJƐNjnj,ǍƗƮǎ,Ǐƚưǐ,ǑƝƲǒ,ǓƠƴǔ,ǕƯƶǖ,ǗƣƸǘ,ǙƦƻǚ,ǛƱƽǜ,ǝƳǞǟ,ǠƭǡǢ,ǣƵǂǤ,ǥƷDŽǦ,ǧƼǨǩ,ǪƾNJǫ,ǬǞĚǭ,ǮǡĞǯ,ǰǃDZDz,dzǨƍƔ,ǴDZƓǵ,ǶǷǸǹ,ǢǮǺǻ,ǩdzǼǽ,DzǴǾǿ,ȀƖǍȁ,ȂƙǏȃ,ȄƟǓȅ,ȆǎǕȇ,ȈƥǙȉ,ȊǐǛȋ,Ȍǒǝȍ,ȎƬǠȏ,Ȑǔǣȑ,Ȓǖǥȓ,ȔǘȕȖ,ȗǚǧȘ,șǜǪȚ,țǟǬȜ,ȝǤǰȞ,ȟǦȠȡ,ȢǫȣȤ,ȃȀȥȦ,ȅȂȧȨ,ȉȄȩȪ,ȏȈȫȬ,ǸȎȭȮ,ȯȰƜǑ,ȱȇȰȲ,ȳȴƢǗ,ȵȋȆȶ,ȷȍȴȸ,ȹȺƩȻ,ȼȑȊȽ,ȾȓȌȿ,ɀȖȺɁ,ɂȘȐɃ,ɄȚȒɅ,ɆȜȔɇ,ɈȻƪƿ,ɉǺȗɊ,ɋȞșɌ,ɍȡțɎ,ɏȕƺLJ,ɐǼȝɑ,ɒǾȢɓ,ɔȠdžɕ,ɖȣnjɗ,ǹɉɘə,ǻɐɚɛ,ǽɒɜɝ,ɞȨȱɟ,ɠȪȵɡ,ɢȶȷɣ,ɤȬȼɥ,ɦȽȾɧ,ɨȿɀɩ,ɪȮɂɫ,ɬɃɄɭ,ɮɅɆɯ,ɰɊɋɱ,ɲɌɍɳ,ɴɑɵɶ,ɷɵȤȟ,ȥɸɹɺ,ȧɞɻɼ,ȩɠɽɾ,ȫɤɿʀ,ȭɪʁʂ,ʃɸȦȯ,ʄɟʅʆ,ʇʅȲȳ,ʈɡɢʉ,ʊɣʋʌ,ʍʋȸȹ,ʎɥɦʏ,ʐɧɨʑ,ʒɩʓʔ,ʕʓɁʖ,ʗɫɬʘ,ʙɭɮʚ,ʛɯʜʝ,ʞʜɇɈ,ʟəɰʠ,ʡɱɲʢ,ʣɳʤʥ,ʦʤɎɏ,ʧɛɴʨ,ʩɶɷʪ,ʫɝʬʭ,ʮʬɓɔ,ɺʃʯʰ,ʆʇʱʲ,ʌʍʳʴ,ʵɼʄʶ,ʷɾʈʸ,ʹʉʊʺ,ʻʀʎʼ,ʽʏʐʾ,ʿʑʒˀ,ˁʂʗ˂,˃ʘʙ˄,˅ʚʛˆ,ˇʝʞˈ,ˉʠʡˊ,ˋʢʣˌ,ˍʨʩˎ,ˏʭʮː,ˑʯʵ˒,˓ʶʷ˔,˕ʱʹ˖,˗ʸʻ˘,˙ʺʽ˚,˛ʳʿ˜,˝ʼˁ˞,˟ʾ˃ˠ,ˡˀ˅ˢ,ˣ˂ˤ˥,˦˄ˉ˧,˨ˆˋ˩,˪ˤɘʟ,˫ˊˬ˭,ˮˌˍ˯,˰˱ʥʦ,˲ˬɚʧ,˳ˎ˴˵,˶˴ɜʫ,˷˸ǿɖ,ȁˑ˹˺,ɹ˓˻˼,ʰ˕˽˾,ɻ˗˿̀,ʲ˛́̂,ɽ˝̃̄,ʴ̅̆̇,ɿˣ̈̉,̊˒˙̋,̌˔˟̍,̎˖ˡ̏,̐˘˦̑,̒˚˨̓,̔˜̖̕,̗˞˫̘,̙ˠˮ̚,̛̜̅ˇ,̝˧˳̞,̟˩̡̠,̢̕ˈ˱,̣˯̤̥,̦̠ʪˏ,̧̤ː˸,̨˺̩̊,̪˼̫̌,̬˾̭̎,̮̯̀̐,̰̱̋̒,̲̳̂̔,̴̵̗̄,̶̷̙̍,̸̹̺̏,̻̼̉̽,̝̾̑̿,̟̀̓́,̘͂̓̈́,̣͆̚ͅ,͇̹ˢ˰,͈̼˥˲,͉̞͊͋,͌̓˭˶,͍͊˵˷,ʁ˪͎͏,͈̽͐͑,͓̈́͌͒,͍͔͕͋,͖˻̪͗,͘˿̮͙,͚̩̰͛,̴̃͜͝,̶̫͟͞,̸̭͠͡,̻̈ͣ͢,̯ͤ̾ͥ,̱ͦ̀ͧ,̳ͨͩͪ,̵ͫ͂ͬ,̷ͭͮͅ,̺͇ͯͰ,ͱ͉̿Ͳ,ͳ́ʹ͵,Ͷ͆ͷ\u0378\u0379ͺͻ,͙͖ͼͽ,͘͝;Ϳ,ͣ͜\u0380\u0381,͎͢\u0382\u0383,΄΅˽̬,Ά͛΅·,ΈΉ̲́,Ί͚͟\u038b,Ό͡Ή\u038d,ΎΏ̆ΐ,Αͥ͞Β,Γͧ͠Δ,ΕͪΏΖ,Ηʔʕ̜,ΘͬͤΙ,ΚͮͦΛ,ΜͰͨΝ,Ξΐ̛̇,Ο͐ͫΠ,ΡͲͭ\u03a2,Σ͵ͯΤ,Υ̢̖ͩ,Φ͒ͱΧ,Ψ͔ͶΩ,Ϊʹ̡̦,Ϋͷ̧̥,͏Οάέ,͑Φήί,͓Ψΰα,βͽΆγ,δͿΊε,ζ\u038bΌη,θ\u0381Αι,κΒΓλ,μΔΕν,ξ\u0383Θο,πΙΚρ,ςΛΜσ,τΠΡυ,φ\u03a2Σχ,ψΧωϊ,ϋω\u0378ͳ,όύͻ΄,ώγϏϐ,ϑϏ·Έ,ϒεζϓ,ϔηϕϖ,ϗϕ\u038dΎ,Ϙικϙ,Ϛλμϛ,ϜνϝϞ,ϟϝΖΗ,Ϡοπϡ,Ϣρςϣ,ϤσϥϦ,ϧϥΝΞ,Ϩέτϩ,Ϫυφϫ,ϬχϭϮ,ϯϭΤΥ,ϰίψϱ,ϲϊϋϳ,ϴαϵ϶,ϷϵΩΪ,Ϟϟϸ,ϹϺώϻ,ϼϽϒϾ,ϿϓϔЀ,ЁЂϘЃ,ЄϙϚЅ,ІϛϜЇ,ЈЉϠЊ,ЋϡϢЌ,ЍϣϤЎ,ЏϦϧА,БϩϪВ,ГϫϬД,ЕϱϲЖ,З϶ϷИ,\u0379˹̨Й,ͺύКЛ,КόМН,ͼβϺО,ЙМϹП,ϐϑРС,;δϽТ,ЛϻϼУ,НРϿФ,ϖϗХЦ,ЦϸЏ,\u0380θЂЧ,ОϾЁШ,ПЀЄЩ,СХІЪ,ЪАЫ,\u0382ξЉЬ,ЬЭάϨ,ТЃЈЮ,УЅЋЯ,ФЇЍа,аЫϮϯ,ЧЊЭб,бвήϰ,ШЌБг,ЩЎГд,дϳЗ,ЮВве,ежΰϴ,ЯДЕз,зИи,гЖжй,йи͕Ϋ,ƕкл,Įǯƌ,мǭlj,ĪǷĝ,нǁLj,оʖǀ,пČƧ,рĺơ,ƘIJƛ,ǵɗNjл,įɕDžк,īƉмƏ,сęнƈ,тĕсĠ,уčоĘ,фĉуĔ,хĻпł,ǶđтĜ,ƫĿфĐ,Ƥķхľ,ƞijрĶ->'
operands = [np.zeros([2]*len(tensor), dtype=complex) for tensor in expression.split(',')]
operands[-1] = np.zeros([2]*4, dtype=complex)

@DmitryLyakh
Copy link
Collaborator

Can you rerun the failing case with the environment variable CUTENSORNET_LOG_LEVEL=5 set and attach the log file?

@sss441803
Copy link
Author

Hi Dmitry,

Thank you for the response and the file is attached.
std.log

@DmitryLyakh
Copy link
Collaborator

Is it a multi-GPU node? Do you mind rerunning it again with the the following environment variables set and attaching the log again?
CUDA_VISIBLE_DEVICES=0
CUTENSORNET_LOG_LEVEL=5
CUTENSOR_LOG_LEVEL=5

@sss441803
Copy link
Author

std.log
Hi, I have changed these variables, but I ran it on a single GPU login node of a cluster with compute nodes with multiple GPUs each.

@DmitryLyakh
Copy link
Collaborator

Is this the Sycamore-53 depth-20 circuit? Do you have any transformations done on it which would introduce hyper-edges? Do you enable slicing? I see very large intermediate tensors there and very high workspace size demands. What happens if you do not try to import the contraction path from Cotengra but let cuTensorNet figure out its own contraction path? Does it work?

@sss441803
Copy link
Author

sss441803 commented Aug 10, 2023

Thank you for the feedback!

Yes it's sycamore 53 depth 20
I don't think there are hyperedges.
I enabled slicing.

When the path is not imported it works perfectly fine so I don't think there are hyperedges. Cuquantum also works fine with hyperedges when it finds its own path as long as simplification is disabled.

Perhaps somehow the path does not correspond to the same path when moved from cotengra to cuquantum.

@acharara-nv
Copy link

Hi Minzhao, from the log, we can see that the intermediate tensors are too large to fit the memory, and thus the network returns with CUTENSORNET_STATUS_NOT_SUPPORTED.
The path is imported properly from contengra, but the slicing info from cotengra is not imported, thus the intermediate tensors are too large.
You would want to use the OptimizerOptions slicing attribute to supply the sliced indices into cuTensorNet's optimizer. According to cotengra's docs, you would want to work with the cotengra's ContractionTree object to extract the sliced indices.
Would you please try that and post feedback?

@sss441803
Copy link
Author

Hi Ali! Thank you for the feedback. It works after slicing is supplied. I was expecting cuquantum to take the path and find the slices on its own.

@haidarazzam
Copy link
Collaborator

Also note that paths from Cotengra or other package can run with cuTensorNet but not always because again either if slicing is missing or if the path generated by such package cannot run on GPU for other reason such as number of modes >64.

@mtjrider
Copy link
Collaborator

mtjrider commented Sep 3, 2023

@sss441803 has your issue been addressed?

@sss441803
Copy link
Author

sss441803 commented Sep 6, 2023

@sss441803 has your issue been addressed?

Hi, the answers are sufficient for solving my own issues with the workaround 'CUTENSORNET_CONTRACTION_OPTIMIZER_CONFIG_SIMPLIFICATION_DISABLE_DR'.

Azzam did say that a proper fix should come regarding diagonal gates.

@haidarazzam
Copy link
Collaborator

@sss441803
hyperedges have been allowed in cuTensorNet since a while.
The flops issue you observed was due because cuTensorNet simplification phase was trying to simplify the contraction before the path-optimizer kick-in which ended-up with a non optimal path due to these simplifications. Usually for small circuit we prefer to disable simplification and let the optimizer find the best path. Simplification cannot guarantee an optimal path and usually is implemented as preprocessing phase to decrease the network size thus speeding up the optimizer process for large circuit.
Since 23.06, the performance of the path optimizer has been improved a lot in particular for large circuit (10x), thus, the simplification phase is not as advantageous as before and can be safely disabled except if user still want to get a faster path-optimizer for large circuit where simplification doesn't mess-up the path.

Bottom line the fix was to disable simplification which was enabled by default. ( I am not sure if you are mentioning another fix "Azzam did say that a proper fix should come regarding diagonal gates."

@sss441803
Copy link
Author

I am happy if the issue is closed since the current fix is sufficient for me. I was just expecting some more updates since you mentioned more details will come later in the first comment, but I am fine with what we have now.

Just to clarify the original issue, it's not that small diagonal gate circuits don't benefit from simplification, it's that simplification actively breaks things. For circuits with 300~400 tensors, the default behavior fails to find a path for diagonal gate circuits completely while working fine for non-diagnoal gates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants