Skip to content

v0.5rc1: graph support eager lrs (#6262)

Compare
Choose a tag to compare
@jackalcooper jackalcooper released this 14 Sep 05:41
76e78fd

Changelog

v0.5rc1 (13/09/2021)

Highlights

  • First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
  • Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
  • nn.Module for eager execution
  • nn.Graph for lazy execution
  • DDP for data parallel

A sneak peek of the new API

Here is a minimum example showcasing how to incorporate a nn.Module in a nn.Graph and have it run in lazy mode.

class NeuralGraph(flow.nn.Graph):
    def __init__(self, ...):
        super().__init__()
        self.model = model # model is a nn.Module instance

    def build(self, x):
        y_pred = self.model(x)
        return y_pred

graph = NeuralGraph() # to create a nn.Graph instance
y_pred = graph(x) # to run the created nn.Graph

New in Python API

  • [feature][eager][op][test][python][interface] Add test for convtranspose2d #5239
  • [enhancement][python][interface] Add GroupNorm #5175
  • [enhancement][eager][python][interface] [Add] avgpool1d avgpool3d #5165
  • [feature][eager][op][python][interface] Add deconv cpu impl #5224
  • [bug][eager][api][python][interface] Fix acosh bug #5221
  • [feature][eager][op][python][interface] Dev modules ctc loss #5168
  • [bottleneck][bug][documentation][python][interface] Fix meshgrid test bug #5208
  • [eager][documentation][python][interface] Rename CosineScheduler to CosineAnnealingLR #5112
  • [feature][eager][python][interface] Add meshgrid module #5205
  • [enhancement][feature][bug][op][python] support bias in conv2d's parameter list #5322
  • [eager][documentation][api][python][interface] add not_equal, greater_equal and less_equal module #5350
  • [enhancement][eager][python] refine pow module and its test #5319
  • [enhancement][eager][op][python] Add triu op #5329
  • [enhancement][bug][python] Fix optimizer for not supporting all kinds of iterables #5355
  • [bug][python][interface] raise IndexError in get_canonical_index to support for loop #5345
  • [bug][python][interface] tensor slice assign supports broadcasting #5344
  • [enhancement][op][python] add cpu group conv logic #5314
  • [enhancement][python] Add 'nn.Mish' module and corresponding functions #5310
  • [enhancement][build][python] Remove ONNX from setup py #5297
  • [enhancement][python][interface] [add] zeropad2d #5278
  • [feature][system][python][interface] Lazy nn.Graph FeedInputOpExpr #5458
  • [feature][python][interface] integrate nn.image.flip #5411
  • [bug][python] Fix issues in point of MultiClientSession #5469
  • [enhancement][bug][python] update HasAllMultiClientEnvVars() #5459
  • [enhancement][python] Add in_top_k function #5428
  • [enhancement][python] Dev add docstring #5449
  • [feature][api][python] MultiClientSession #5407
  • [documentation][python] remove --user #5431
  • [feature][python][interface] nn.Graph python #5309
  • [feature][python][interface] Fea/nn graph/graph name #5413
  • [bug][python][interface] rm nn.Graph.train #5424
  • [op][documentation][api][python][interface] add bernoulli module #5353
  • [enhancement][python] flow.S/B/P #5306
  • [enhancement][documentation][python] Add instruction on upgrade pip #5400
  • [enhancement][python] Rm oneflow export and experimental #5589
  • [bug][python] Fix nn.graph.utils module conflict #5598
  • [feature][ci][python] Update autotest framework #5520
  • [enhancement][python] copy of_proto_python_dir to compatible_single_client_python #5539
  • [enhancement][api][python] del default env init #5537
  • [enhancement][python] Fix single client using same glog file #5535
  • [bug][api][python] Fix Session TryClose #5531
  • [enhancement][feature][python] split vector-matrix norm #5478
  • [feature][eager][op][python][interface] Add more upsample kernel #5382
  • [enhancement][feature][test][python] add torchstyle unittest #5489
  • [feature][system][python] nn.Graph with training #5662
  • [enhancement][feature][python] Fea/nn graph/block proxy func #5727
  • [enhancement][api][python] consistent_tensor_to_api #5703
  • [feature][eager][op][python] Dev Align torch avgpool #5610
  • [enhancement][python] fix circular deps of sbp python module #5706
  • [documentation][python] [part5]Remove singleclient outdated api #5674
  • [enhancement][python] [part4]Remove singleclient outdated api #5672
  • [bug][op][python] remove outdated code in conv3d #5696
  • [enhancement][test][python] enlarge tolerance of dataloader test #5689
  • [enhancement][test][python] add autotest for some math ops #5646
  • [feature][python] nn.Graph optimizer part 2: add L2, pass job complete, refactor #5604
  • [enhancement][python] Add clip_grad_norm #5299
  • [purge][python] Remove Single-Client API in oneflow default python #5827
  • [bug][python] Fix ddp grad size #5834
  • [enhancement][feature][python] Dev RMSprop graph conf #5768
  • [enhancement][purge][eager][python] remove scale arg in optimizer #5821
  • [enhancement][feature][python] graph/block io check #5803
  • [enhancement][feature][python] Dev adam graph conf #5709
  • [purge][python] [part10]Remove singleclient outdated api #5756
  • [feature][api][python] better repr of nn.Graph for debug #5762
  • [bug][python] fix weight decay in RMSprop #5755
  • [purge][python] [part9]Remove singleclient outdated api #5752
  • [purge][python] [part8]Remove singleclient outdated api #5750
  • [documentation][python] add first batch of methods in oneflow.nn.functional namespace #5693
  • [purge][python] [part6]Remove singleclient outdated api #5704
  • [bug][python] use default_generator.seed() as random_seed in init #5721
  • [bug][system][python] ddp broadcast params and buffers #5913
  • [enhancement][test][python] Add consistent tensor requires grad test #5925
  • [bug][python] wrap flow.nn.init.* with flow.no_grad() #5932
  • [feature][api][python][interface] add clip_grad to optimizer #5817
  • [enhancement][ci][op][test][python] add randperm with test and docs #5680
  • [feature][api][python] Fea/nn graph/ lr_schedule(and cosine lr_sch) and opt_group #5846
  • [bug][python] fix bug of SyncOnMasterFn atexit #5909
  • [purge][python] Delete single client nn modules #6061
  • [enhancement][python] Move framework.distribute to env #6022
  • [bug][python] skip sync when abnormally exiting #6025
  • [feature][python] Fea/nn graph/warmup amp config #5969
  • [documentation][python] add optimizer api docs #6131
  • [documentation][python] add_tensor_api_doc #6127
  • [bug][python] Fix test_grid_sample.py and test_affine_grid.py threshold #6125
  • [documentation][api][python] add doc of graph #6093
  • [bug][python] Fix make of_format fail in ubuntu #6120
  • [feature][api][python][interface] Fea/graph helpers #6088
  • [enhancement][eager][python][interface] Use flow.randint in dataloader #6086
  • [feature][eager][api][python][interface] Import oneflow as torch #6076
  • [enhancement][test][api][python][refactor] rename OfrecordReader to OFRcordReader #6090
  • [purge][python][need-single-client-tests] Delete single client nn modules #6082
  • [enhancement][python] flow.load tolerates FileNotFound fault #6083
  • [feature][python] Fea/pipeline in graph #6105
  • [enhancement][test][python] graph activation checkpointing #6192
  • [enhancement][feature][op][python] rnn test #6165

New in Ops:

  • [enhancement][op][api][refactor] [Functional] Part2: Add partial unary and math functional apis #5218
  • [enhancement][bug][op][interface] Refine deconv kernel #5229
  • [enhancement][op][api][interface] add ReflectionPad2d #5172
  • [feature][eager][op][api][interface] crossentropyloss and nllloss support ignore_index #5195
  • [feature][eager][op][api][interface] Yejiaojiao/dev bcewithlogitsloss #5173
  • [bug][ci][op] Dev user op set default is_dynamic #5223
  • [enhancement][op] add magic method for pow #5199
  • [enhancement][op][interface] add cpu version of upsampling #5194
  • [enhancement][bug][op][api][interface] add ReplicationPad2d #5148
  • [feature][eager][op][api][interface] add kldivloss module #5155
  • [feature][eager][op][documentation][build][api][interface] Add floor module and the corresponding testcases #4964
  • [enhancement][feature][op] Dev conv1d module #5280
  • [enhancement][op] Add ctc_greedy_decoder op #5294
  • [enhancement][op][system] Dev remove default grad func #5320
  • [enhancement][op][system] Add pad grad func. #5354
  • [enhancement][op][system] Add gradient funcs. #5348
  • [feature][purge][bug][eager][op][interface] fix upsample nearest bug #5347
  • [enhancement][op][system] [Functional] Part7: Migrate pooling ops #5253
  • [enhancement][op] nvjpeg hardware acc #5240
  • [enhancement][feature][ci][eager][op][api][interface] Add bmm module #5334
  • [enhancement][eager][op] Dev image decode eager #5333
  • [enhancement][op] Optimize softmax warp impl #4977
  • [enhancement][eager][op] Dev tensor buffer eager #5317
  • [enhancement][op][api][refactor] [Functional] Part6: Migrate conv op #5252
  • [enhancement][eager][op] Dev sort eager #5284
  • [enhancement][bug][op][api] fix bceloss bug in default weight and reduction #5303
  • [bug][eager][op] remove redundant assert and check #5264
  • [enhancement][bug][ci][op] fix bceloss bug about weight #5269
  • [enhancement][op][api][refactor] [Functional] Part5: Migrate nn ops #5249
  • [enhancement][eager][op] Dev argsort eager #5273
  • [enhancement][op][api][refactor] [Functional] Part4: Migrate array ops #5247
  • [enhancement][op][api][refactor] [Functional] Part3: Migrate binary and activation ops #5246
  • [bug][ci][op][test] Dev fix rmsprop ci fail #5481
  • [enhancement][op] add inplace method: Tensor.sin_ #5471
  • [bug][op] hotfix image_batch_align #5461
  • [enhancement][eager][op][interface] Dev maxpool series op 123d #5244
  • [bug][op] fix pool gpu kernel #5446
  • [feature][eager][op][api][interface] add pixelshufflev2 module #5383
  • [enhancement][feature][ci][eager][op][documentation][api][interface] Add flow xxx and tensor xxx autotest #5386
  • [enhancement][feature][eager][op][api][interface] Modules chunk #5324
  • [enhancement][eager][op] add image normalize for eager #5402
  • [enhancement][eager][op] Dev batch align module #5401
  • [enhancement][eager][op] add coco reader module #5391
  • [enhancement][wip][op] Restruct Elementwise kernel #4130
  • [bug][op] Fix DecodeRandom reuse mem #5606
  • [enhancement][op] Align pytorch maxpool #5525
  • [enhancement][bottleneck][eager][op][api] implementation of constantpad-3d op #5529
  • [enhancement][eager][op] Add scale size for resize #5509
  • [enhancement][op][api][refactor] Dev optimize tensor setitem #5501
  • [enhancement][op] register uint8 dtypeto support dataloader #5499
  • [enhancement][op] Add unique.cuh #5487
  • [enhancement][op][api][interface] Dev ofrecord auto truncating #5412
  • [feature][op][system][interface] Feat: LazyInterpret::ApplyImpl support SourceUserOpExpr and Copy #5711
  • [enhancement][op][interface] Dev logical_and/or modules #5636
  • [enhancement][op] support any number positional arguments for ones and zeros op #5698
  • [enhancement][feature][eager][op] Add conv3d Module #5327
  • [feature][eager][op][api][interface] add batchnorm3d module #5631
  • [bug][eager][op] fix reduce min max backward bug #5651
  • [enhancement][op] Debug dim scatter #5371
  • [enhancement][op][interface] Dev eye #5583
  • [enhancement][eager][op] Dev minimum maximum #5576
  • [enhancement][op] Restruct activation grad op #5669
  • [enhancement][feature][eager][op] Rewrite activation function #5465
  • [bug][op][documentation] add oneflow.cat for documentation #5621
  • [enhancement][op] Lcy logsoftmax #5746
  • [feature][op][need-simple-ci] Feat empty op #5659
  • [enhancement][eager][op] Dev split #5714
  • [enhancement][op][interface] add index_select op #5661
  • [bug][op] fix nvjpeg hw acc #5851
  • [enhancement][op] Remove move in conv_cudnn #5828
  • [enhancement][op][interface] Dev logical_xor module #5694
  • [bug][eager][op] fix squeeze #5808
  • [enhancement][op] Get parallel_id and parallel_num through rank and world size in DDP #5717
  • [bug][eager][op] delete interpolate int type #5805
  • [bug][op] Fix bug in scatter #5743
  • [enhancement][op] Refactor: remove module not required, call function directly #5754
  • [enhancement][op] Remove modules not required(tan, erfc, log1p, scatter_nd) #5791
  • [enhancement][op] Refactor scatter, clamp and pow in cpp instead of in python #5715
  • [enhancement][op] Rm useless code in gather files #5687
  • [enhancement][eager][op] change flip_code to scalar #5786
  • [enhancement][bug][op][interface] fix upsample bug #5753
  • [bug][op][interface] Quick fix Lazy nn.Graph input/output OpConf.BlobConf.is_dynamic #5767
  • [enhancement][bug][eager][op] fix argwhere 0-dim bug #5760
  • [enhancement][eager][op] delete unused code #5744
  • [feature][op] Export fused_scale_tril op #5933
  • [bug][op] Fix backward bug in 3d #5908
  • [bug][op] Fix one_hot api limit #5927
  • [enhancement][eager][op] Dev where scalar #5797
  • [bug][op] fix grad error #5914
  • [feature][bug][op] Fix inplace op circle reference bug #5910
  • [enhancement][op] Move the judgment content to c++, And add scalar fmod #5854
  • [enhancement][op] Support combined_margin_loss op in flow.nn.modules #5830
  • [enhancement][op][api][interface] functional_one_hot #5315
  • [enhancement][op] Dev scalar op #5778
  • [bug][eager][op] fix gather kernel 0 shape #5888
  • [enhancement][op] add l2_normalize for mutl-client interfaces #5859
  • [feature][op] Export function softmax_cross_entropy #6056
  • [enhancement][op] Add int attr for functional adaptive average pool #6059
  • [enhancement][op][interface] dev full op #5955
  • [bug][eager][op] fix 0dim inplace add #6029
  • [feature][op][system][interface] Feat: nn.Graph image gpu decoder #6014
  • [enhancement][op][interface] dev optim_optim_lr_scheduler_multisteplr #5975
  • [enhancement][op] NopKernel #6035
  • [enhancement][eager][op][api] Dev tril op #6005
  • [enhancement][op] dev unfold and fold #5675
  • [enhancement][op] ResNet CUDA Graphs #6018
  • [enhancement][feature][op] add broadcast pow #6013
  • [enhancement][op][interface] init of op diag #5298
  • [op][documentation][api] Fix api document bug #6009
  • [enhancement][op] Dev fused functional #5954
  • [bug][op][build] Add nvcc flag -Werror cross-execution-space-call #6002
  • [bug][op] Fix Normalization grad function #5993
  • [enhancement][feature][eager][op][test][interface] Add fused self attention #5966
  • [enhancement][bug][ci][eager][op][api][interface] Try to fix var bug #5973
  • [enhancement][feature][eager][op][interface] add prod op #5867
  • [enhancement][eager][op][api] add glu op #6065
  • [enhancement][op] Align Torch.nn.functional poolXd #6184
  • [bug][eager][op] fix backward index for gamma beta #6149
  • [bug][op][system] Fix BroadcastMatmulGrad bug #6168
  • [enhancement][op][api] Add Int support for functional.avg/maxpool #6174
  • [bug][eager][op][api][interface] align dropout api name with pytorch #6170
  • [enhancement][op] support inplace operation for hardsigmoid #6137
  • [enhancement][bug][op] Fix do bias correction in Adam/AdamW #5960
  • [bug][eager][op][api][interface] fix repeat 0-dim tensor bug #6150
  • [enhancement][bug][op] Fix select_first_grad bug #6142
  • [bug][ci][eager][op][documentation][interface] Add clipgrad doc and contiguous #6130
  • [bug][op] Fix eager optim dynamic attr bug #6111
  • [enhancement][op] Support grid_sample and affine_grid operator #6038
  • [op][documentation] Export apis for documentation #6068
  • [enhancement][feature][bug][ci][eager][op][documentation][interface] transfer python function to c++ method #6114
  • [op][documentation] Dev functional batch_gather #6233
  • [enhancement][op][test] fix cross_entropy_loss and its test #5799
  • [bug][op] Use attr nd_sbp to check consistent #6222
  • [enhancement][op] Dev fused bn functional #6077
  • [enhancement][op] support default value in intlist #6201
  • [bug][op] fix sparse_softmax get_nd_sbp #6203
  • [bug][op] Fix bug in model fused update #6197
  • [enhancement][op][system][refactor] Optimize tensor getitem. #5433

New in Eager:

  • [enhancement][eager][interface] Reconstruct module files #5251
  • [bug][eager][documentation][interface] Fix conv module bug #5245
  • [bug][ci][eager][interface] Fix bce withlogitloss ci error #5237
  • [feature][eager][api][interface] module BCELoss #5144
  • [enhancement][feature][eager][api][interface] Dev norm op #5178
  • [enhancement][bug][eager] Fix stack module #5222
  • [enhancement][feature][eager] Support different dtype of equal module #5214
  • [enhancement][bug][eager][documentation][api][interface] Add nllloss backward #5210
  • [enhancement][eager][api][upload-core] Decouple FileSystem and IOConf #5162
  • [enhancement][ci][eager] Set lower precision avoid ci failing #5200
  • [eager][documentation] Add hint when apply FunctionNode second time #5369
  • [enhancement][feature][bug][ci][eager][documentation][api] Fix upsample bilinear bug #5366
  • [bug][eager] Fix not contiguous ndarray to tensor bug #5351
  • [enhancement][eager][system] Infer consistent tensor meta #5118
  • [feature][eager] Feat graph autograd engine #5296
  • [enhancement][eager][interface] Dev type as module #5349
  • [feature][eager][documentation][api][interface] Add new ones module #5342
  • [enhancement][bug][eager] Fix logical slice assign dtype #5339
  • [bug][ci][eager][documentation][api][interface] Fix where module bug #5300
  • [bug][ci][eager][documentation][api] Fix l1loss ci error #5307
  • [enhancement][bug][eager][documentation][api][interface] Qi's First Edit of deleting "print" and ".numpy" #5129
  • [feature][eager][refactor] Separate autograd meta to tensor #5267
  • [feature][eager][api][interface] add tile module #5234
  • [enhancement][eager] Release lambda function to reuse tensor memory #5266
  • [feature][bug][eager][documentation] Fix default value not set bug #5483
  • [enhancement][eager][interface] [Add] gather_nd scatter_nd #5422
  • [enhancement][bug][eager] fix param #5473
  • [bug][eager] Fix Tensor.grad setter bug #5462
  • [enhancement][eager] Rename now_grad_arg to current_grad #5466
  • [eager][test][documentation][interface] Add autotest part1 #5436
  • [enhancement][eager] Use functional copy instead of op_builder #5460
  • [bottleneck][bug][eager][interface] fix -1 index not support bug #5448
  • [bug][ci][eager][documentation][api] Fix concat backward bug #5443
  • [enhancement][bug][ci][eager] Add autograd engine warning #5444
  • [feature][eager][api][interface] Smoothl1loss #5256
  • [enhancement][bottleneck][eager] remove device dtype params #5434
  • [bug][ci][eager][documentation][interface] Delete maxpool failed test #5409
  • [enhancement][eager][api] Add tensor grad assginment #5379
  • [enhancement][bug][eager] fix-abs #5398
  • [enhancement][bug][eager][interface] Fix bn track running stats #5393
  • [enhancement][bug][eager] Support uint dtype of constant op #5396
  • [enhancement][bug][eager][documentation][interface] Delete useless code upsample #5392
  • [enhancement][ci][eager][interface] add flow.view #5301
  • [enhancement][bug][ci][eager][api][interface] Add masked select module #5356
  • [bug][eager][interface] Fix batchnorm backward bug #5602
  • [enhancement][eager] Support weight_dacay(l2 actually) #5587
  • [feature][eager][documentation][api] Add new autotest #5588
  • [enhancement][eager][documentation][api] Dev fmod #5404
  • [feature][eager] Support inplace add #5432
  • [feature][eager][interface] Feat tensor stride property #5543
  • [enhancement][feature][eager][documentation][api] Add flip module #5541
  • [feature][eager] Feat module repr #5486
  • [enhancement][bottleneck][bug][eager][interface] Fix maxpool1d params #5493
  • [enhancement][feature][eager][interface] Dev flow.utils.data part1 #5406
  • [bug][eager][api] Fix tensor getitem bug #5474
  • [enhancement][eager][need-simple-ci] export datasets interface #5691
  • [enhancement][eager][system] rebase #5601
  • [enhancement][eager][test] added nn.RecordBytesDecoder with its test #5475
  • [enhancement][feature][eager][need-simple-ci] 0-dim tensor support #5552
  • [enhancement][bug][eager] rewrite slice_update backward #5677
  • [enhancement][bug][eager][interface] align view input style with torch #5676
  • [enhancement][eager][interface][need-simple-ci] add autotests for modules #5666
  • [enhancement][bottleneck][eager][interface] Dev constantpad1d op #5579
  • [enhancement][eager][api][interface] Restruct MathOps AutoTest #5654
  • [enhancement][bug][ci][eager] Fix flip bug #5657
  • [bug][eager][api][interface] Fix expand module bug #5650
  • [enhancement][bug][eager][documentation][api] Fix repeat bug #5633
  • [enhancement][eager][test][api][interface] Add new autotest #5617
  • [enhancement][eager][api][interface] Dev flow.utils.data part2 #5500
  • [enhancement][bug][eager] make setitem device match #5835
  • [bug][eager][api][interface] align reshape input param with pytorch #5804
  • [feature][bug][eager][api] Align where op with torch #5850
  • [enhancement][bug][eager][api] Restruct prelu op #5829
  • [bug][eager][need-simple-ci] fix pooling ceil_mode bug #5818
  • [enhancement][eager] stateful local kernel supports consistent #5789
  • [bug][eager][api][interface] Fix argwhere bug #5816
  • [enhancement][eager][documentation][api] dev-nonzero #5809
  • [enhancement][feature][eager][api] Add fake quantize op #5690
  • [enhancement][bug][eager][documentation][api] Add api #5663
  • [enhancement][eager] Refactor consistent infer result #5790
  • [bug][eager][need-simple-ci] skip dataloader test #5780
  • [bug][eager][need-simple-ci] fix 0-dim tensor.fill_ #5771
  • [enhancement][eager] Cpu mpi broadcast #5726
  • [feature][eager] Feat grad mode classes #5956
  • [enhancement][bug][eager] fix wrong names #5951
  • [enhancement][eager][system] Local dep object pool #5953
  • [enhancement][eager][interface] rename OpExprInterpState to AutoGradCaptureState #5918
  • [bug][eager] Fix linear bug #5945
  • [bug][eager] Fix tensor_meta update bug #5924
  • [enhancement][eager] use flow.randperm #5928
  • [enhancement][eager] consistent init/save/load #5896
  • [enhancement][bug][eager][documentation][interface] Restruct sort and argsort op #5911
  • [enhancement][bug][eager][interface] Try to fix the problem that the insightface cannot converge。 #5906
  • [enhancement][bug][eager][interface] Add autotest #5899
  • [enhancement][eager] The scheduler thread joins worker threads #5893
  • [enhancement][eager] Bugfix async callback #5881
  • [feature][eager] Feat tensor to bool #5836
  • [bug][eager] Remove inplace broadcast_add #5551
  • [enhancement][eager] Broadcast consistent shape and dtype #5784
  • [enhancement][eager] Fix optimizer list parameters input bug #5848
  • [enhancement][eager][interface] Dev flow.utils.data part3 #5644
  • [enhancement][eager][api] Normalize naming of modules #6066
  • [enhancement][feature][eager][api][interface] add truncnormal #6051
  • [enhancement][bug][eager] AutoMatedTest support test module.parameter.grad #6043
  • [enhancement][feature][bug][eager] add module call kwags #6069
  • [enhancement][eager][api][interface] add tensor.item tensor.tolist #6021
  • [enhancement][eager][api][interface] Export pool ops api #6047
  • [enhancement][bug][eager][test][documentation][interface] Add more autotest sample #6039
  • [enhancement][bug][eager][system] disable cuda_h2d stream #6020
  • [feature][eager][test][api][interface] Add autotest codegen #6019
  • [feature][eager][documentation] Refactor cosine lr scheduler #6000
  • [enhancement][eager][interface] tensor.cpu/tensor.cuda #5894
  • [enhancement][eager][api] Support consistent_tensor.to(dtype) #5991
  • [bug][eager][interface] remove redundant codes in ModuleDict #5961
  • [bug][eager] Fix LayerNorm check bug #6196
  • [enhancement][eager][api] Change dropout api #6182
  • [enhancement][good for pr][eager][api][interface] add: test convert dependency #6023
  • [enhancement][bug][eager][interface] Fix autotest codegen bug #6171
  • [bug][eager] restore instr_local_dep_object_pool_size for nccl #6160
  • [enhancement][eager][api][interface] Aligin pooling op functional api names with torch #6163
  • [feature][bug][eager][api][interface] delete file #6162
  • [bug][eager] Fix optim load_state_dict bug #6152
  • [enhancement][eager][api] add is_training to dropout functor #6148
  • [enhancement][eager] Decompose nd sbp boxing #5800
  • [enhancement][eager] support consistent_tensor.to(copy=True) #6122
  • [feature][eager] Static grad scaler #6135
  • [bug][eager] Fix LayerNorm expr bug #6121
  • [bug][eager][api] move numpy c api init in numpy.cpp, make np array contiguous before copying #6117
  • [enhancement][eager][refactor] Remove params from ParamGroup getitem #6096
  • [enhancement][feature][eager] Support tensor and optimizer serialization #6087
  • [enhancement][bug][eager] fix bug about tensor str in nonsymmetric cast and getitem in consist… #6239
  • [enhancement][eager] Cpu all reduce #5849
  • [feature][eager] Support assign copy interface #6228
  • [enhancement][eager][api][interface] Dev reconstruct pad ops #6223
  • [enhancement][eager][api][interface] support flow.cuda.is_available #6124
  • [bug][eager] make flow._C.local_all_reduce sync lanuched #6175
  • [enhancement][eager] Rename flow to oneflow in user hint #6190
  • [bug][eager][tooling][test][api][interface] Autotest generate input tensor #6206
  • [enhancement][eager] consistent tensor zeros_() #6202
  • [enhancement][eager] Cpu mpi #5865

Build enhancements:

  • [bug][build] Fix GRPC compilation failure on CMake 3.20 #5255
  • [bug][build] Refine header file copy #5254
  • [bug][build] Fix older version CMake doesn't support multiple targets in CLI #5248
  • [bug][build] Turn off NCCL_STATIC/CUDNN_STATIC when CUDA_STATIC is OFF #5243
  • [feature][build] Fix support for Ninja and add Ninja build in Simple CI #5236
  • [enhancement][build] Add cmake option CUDA_STATIC #5164
  • [bug][build] Fix protobuf debug postfix #5233
  • [enhancement][ci][build] Move default third party dir into build dir #5230
  • [enhancement][build] Refine protobuf cmake #5216
  • [enhancement][ci][build] Remove transport test main #5215
  • [enhancement][ci][build] Speedup opencv build #5213
  • [enhancement][build] Support clang #5015
  • [enhancement][documentation][build] Add prefix when creating git archive #5201
  • [enhancement][build] Add cmake option NCCL_STATIC #5160
  • [enhancement][build] Refine CMake CUDA version handling #5192
  • [enhancement][build] Use clang plugin to check Maybe variables are used #5358
  • [enhancement][build] Add BUILD_BYPRODUCTS for ExternalProject_Add #5316
  • [enhancement][build] Add cmake init cache to simplify user onboarding #5311
  • [feature][bug][build] Fix macOS support and run macOS build in Simple CI #4947
  • [enhancement][build] flatbuffers use mirror #5295
  • [enhancement][build] Don't build test by default #5302
  • [enhancement][build] Prevent building from scratch when toggle flag BUILD_GIT_VERSION #5259
  • [enhancement][build] Refine gRPC, glog, gflags cmake for conda #5276
  • [feature][build] Support XLA with CPU-only #5260
  • [enhancement][ci][onnx][build] Remove ONNX from CI #5257
  • [enhancement][build] Refactor build_wheel to support oneflowinc images #5427
  • [enhancement][build] Add arg skip_audit in build wheel #5423
  • [bug][build] hwloc disable shared #5388
  • [documentation][build] Update readme for autoconf and libtool #5376
  • [enhancement][build] remove dir python and compatible_single_client_python #5609
  • [bug][build][system] Fix pyyaml version #5594
  • [enhancement][ci][build] force release flags #5574
  • [bug][build] prevent endless loop #5534
  • [enhancement][build] Support sccache #5528
  • [enhancement][build] Add definition for CMAKE_BUILD_TYPE and print cmake_build_type in oneflow doctor #5505
  • [enhancement][ci][build][need-simple-ci] Fix macOS for recent changes #5705
  • [bug][build] fix return type error on gcc 4.8.5 #5660
  • [enhancement][build] Check CMAKE_BUILD_TYPE #5656
  • [enhancement][build] add -Werror=return-type #5655
  • [enhancement][build] Clean and fix for new py dir #5618
  • [enhancement][build] cmake: disable array-bounds check & treat warnings as errors for pyextobj and oneflow_internal & fix warnings #5838
  • [bug][build] set CMAKE_BUILD_TYPE to Release if undefined #5842
  • [enhancement][build][need-simple-ci] Fix all warnings & Add option TREAT_WARING_AS_ERROR to cmake #5751
  • [enhancement][build] add CMAKE_INTERPROCEDURAL_OPTIMIZATION in fast cmake cache #5970
  • [enhancement][build] add clang tidy target #5957
  • [bug][build] cmake: fix cmake cache args in opencv #5959
  • [enhancement][build] Add cmake option USE_SYSTEM_NCCL #5897
  • [enhancement][build] cmake: include third party headers as system headers to avoid warnings #5879
  • [enhancement][build] Ignore opencv-python on machine aarch64 #5884
  • [enhancement][build] enable CMake first class cuda support #5858
  • [bug][build] Fix compile warning (strict-aliasing) #5872
  • [enhancement][bug][build][need-simple-ci] Upgrade gtest and fix some errors raised by clang #6079
  • [bug][ci][build] cmake: fix ninja build in CI #6072
  • [bug][build] fix files not actually removed when building for multiple python versions #6060
  • [bug][build][api] functional_api: fix build error in mac os #6010
  • [bug][build][need-simple-ci][need-single-client-tests] Fix recompile from scratch #6036
  • [bug][build] Turn on NVCC's warnings #6011
  • [bug][build][need-single-client-tests] fix bundle .so of other python version #6034
  • [bug][ci][build][need-single-client-tests] use copy_all_files_in_dir to replace copy_files #6033
  • [enhancement][build] check compiler version in cmake #6026
  • [enhancement][build] Add CUDA_NVCC_THREADS_NUMBER #6017
  • [enhancement][build][need-simple-ci] optimize of_include_copy #5978
  • [enhancement][ci][build][need-single-client-tests] CI: remove -DTREAT_WARNINGS_AS_ERRORS=OFF #6008
  • [enhancement][build][xla] xrt: fix all warnings #5915
  • [enhancement][build] Prevent opencv compile failure with std 17 #5997
  • [enhancement][build] Use bundled cub #5998
  • [enhancement][ci][build] update clang tidy diff warnings-as-errors option #5989
  • [enhancement][build] Update run_clang_tidy.py to set return code and add warning-as-errors #5977
  • [enhancement][build] check: fix clang-tidy-diff commands #5972
  • [bug][build] Suppress NVCC warning #177-D #6094

XLA enhancements:

  • [bug][xla] Make the blob header memory aligned. #5286

System:

  • [enhancement][system] Refactor Memory Zone #5072
  • [enhancement][system] Add interface InferContext::OutputTensorDesc #5219
  • [bug][system] Lazy construct functor to make sure that the operators has already been registered. #5225
  • [enhancement][system] Refactor infer ctx output isdynamic #5220
  • [enhancement][system] Refactor infer ctx input isdynamic #5211
  • [enhancement][system] Wake up the heartbeat thread immediately #5081
  • [enhancement][system] Fix xla test case fail #5203
  • [enhancement][system] Add interface InferContext::InputDType #5153
  • [purge][system] delete const_cast in Output #5196
  • [feature][system] Add hwloc for topology detection #5291
  • [enhancement][system] fix registry may segment #5336
  • [enhancement][system] Use functional api instead of op_expr_helper::XXXOp. #5364
  • [enhancement][system] move btob to op #5274
  • [documentation][system] Add Latest News section in README #5361
  • [enhancement][bug][system] fix dropout module: return directly if not training #5346
  • [bug][system] add missing JUST #5357
  • [documentation][system] Add more communication outlets on README #5359
  • [enhancement][feature][system] CommNet dynamic register memory #5281
  • [enhancement][system] Use symbol device #5341
  • [enhancement][system] fix multithread bug in env #5283
  • [bug][system][api] fix bug in cfg_replacement #5335
  • [bug][system] Fix create log directory thread-unsafe #5326
  • [bug][system] fix_bug_in_make_parallel #5328
  • [enhancement][system][cfg] replace train_conf, job_conf using cfg::xx #5263
  • [enhancement][system][quantization] support tensorrt in qat #5287
  • [enhancement][system][api] Export functional apis for oneflow.experimental. #5313
  • [enhancement][system] fix bug check between cfg enum and proto enum #5285
  • [enhancement][system] replace CHECK_EQ using CHECK_EQ_OR_RETURN #5279
  • [enhancement][system] Refactor SbpXXX to cfg::SbpXXX #5120
  • [enhancement][system][api] add detach for LazyMirroredtensorImpl #5270
  • [enhancement][system] shorten XXIsDynamic4ArgNameAndIndex to be xxIsDynamic #5265
  • [enhancement][system][cfg] job_config to cfg #5235
  • [feature][system] Multi-Client LogicalRun degenerate to PhysicalRun #5479
  • [enhancement][system] fix ConstructOp without JUST #5480
  • [enhancement][system] Output arg modifier return maybe part 1 #5451
  • [feature][system][interface] Fea/nn graph/graph build ctx #5420
  • [enhancement][system] Throw exception if check failed #5457
  • [feature][system] multi client launch #5372
  • [enhancement][system][api] Optimize reduce mean #5452
  • [enhancement][system] export Tensor only to python #5440
  • [enhancement][system] Output arg modifier return maybe part_0 #5447
  • [enhancement][system] ThreadMgr support AddPlan #5450
  • [enhancement][system] Refactor infer ctx input tensordesc #5226
  • [enhancement][system][api] instruction builder return maybe #5442
  • [feature][system][interface] MultiClientSessionContext #5421
  • [enhancement][feature][system] add launcher, update multi client launch and exit #5414
  • [purge][system][refactor] Remove IOConf #5419
  • [enhancement][system] Dev refine generator #5426
  • [enhancement][system] Support inplace operations #5204
  • [enhancement][system][refactor] Dev refactor generator #5397
  • [enhancement][system] Add new placement init func #5408
  • [enhancement][system] NNGraphIf #5387
  • [enhancement][system][refactor] Cast explicitily in unpack call to avoid confilt with Optional. #5380
  • [enhancement][system][interface] [Random Generator] Part2: Migrate functional dropout #5378
  • [enhancement][system] replace ForeignJobInstance using JobInstance #5374
  • [enhancement][system][refactor] Speedup reshape module by 5x. #5381
  • [feature][system][interface] [Random Generator] Part1: Dev random generator #5360
  • [enhancement][system] Add ONEFLOW_STREAM_CUDA_EVENT_FLAG_BLOCKING_SYNC #5612
  • [enhancement][system] [part2]Remove singleclient outdated api #5568
  • [feature][system][interface] nn.Graph call and launch impl #5580
  • [enhancement][system] remove outdated doctest api and "@experimental_api" #5564
  • [feature][system][interface] Register ForeignCallback and Watcher in Multi-Client #5591
  • [enhancement][system] [Part-1]remove outdated api and files of multi-client on master branch #5556
  • [feature][system][interface] LazyInterpret build LocalTensor if input is local #5582
  • [enhancement][system] add job_pass MultiClientAutoSourceAndSinkTick #5507
  • [feature][system] Fea/nn graph/optimizer #5533
  • [feature][system][interface] New/CloseRuntimeBuffers and RunLazyJob impl #5571
  • [feature][system][refactor][interface] NNGraph interface and implement for CompileAndRuntime #5558
  • [feature][system] Fea/nn graph/forward graph #5516
  • [enhancement][system] Lazy job stream type #5389
  • [enhancement][system] Refactor single client autotick #5506
  • [enhancement][system] replace underline using dot in single client #5547
  • [bug][system] fix return type #5548
  • [feature][system][interface] LazyInterpret for UserOpExpr #5544
  • [enhancement][system] Add ProfilerStart/ProfilerStop API #5542
  • [feature][system][interface] LazyInterpreter for FetchOutputOpExpr and set op parallel_distribution #5527
  • [enhancement][system] Multi client push pull #5492
  • [enhancement][system] registry_callback_fn return maybe #5456
  • [enhancement][system] bw_gen_fn return maybe #5455
  • [enhancement][system] gen_bw_fn return maybe #5454
  • [enhancement][system] Compatible single client #5417
  • [feature][system][interface] GlobalMultiClientEnv and refine EagerExecution #5523
  • [enhancement][system] Job pass maybe system #5503
  • [enhancement][system] Remove Plan::net_topo #5502
  • [feature][system][interface] LazyInterpret for FeedVariableOpExpr #5490
  • [enhancement][system] Input arg modifier return maybe #5453
  • [feature][system][interface] Fea/nn graph/block scope #5498
  • [feature][system] jit_fuse_cast_scale #5332
  • [enhancement][system] Remove obsolete Profiler #5747
  • [enhancement][system][api] Dev fix batch norm not stats #5733
  • [enhancement][system] rename rpc_token to TransportToken #5735
  • [enhancement][system][api] Refacotr maximum minimum py2cpp #5724
  • [enhancement][system] Replace piece_id with comm_net_sequence_number #5731
  • [enhancement][system] beautify stack frame #5686
  • [enhancement][system] Add env ONEFLOW_KERNEL_DISABLE_BLOB_ACCESS_CHECKER #5728
  • [enhancement][system] Add env ONEFLOW_THREAD_ENABLE_LOCAL_MESSAGE_QUEUE #5720
  • [enhancement][system][api][refactor] Refactor functional sub, mul and div apis #5713
  • [feature][system] ddp #5008
  • [enhancement][system][api][refactor] Refactor functional matmul and add apis. #5697
  • [bug][system] Fix ClearKV("plan") #5710
  • [enhancement][system] Rename cpu to async cpu #5712
  • [enhancement][system] Support tensor.to()/to_local() #5271
  • [feature][system][refactor][interface] Multi-Runtime for multi nn.Graph #5683
  • [bug][system][refactor] Add tag for Optional inplace constructor #5619
  • [enhancement][system] Move Global to env scope #5670
  • [enhancement][system] add JUST wrapper #5681
  • [enhancement][system] New sync consistent meta info #5634
  • [enhancement][system][refactor][interface] Refactor RuntimeCtx for multi-runtime #5664
  • [feature][system][interface] Feat: memory shared between EagerTensor with VariableRegst #5649
  • [enhancement][system] Use functional call directly instead of construct a module and then call-Add #5613
  • [enhancement][system] disable eager_op consistent mode #5647
  • [enhancement][system] add msg_penddin_list in ibverbs_qp to optimize qp_init_attr.cap.max_send_wr #5485
  • [enhancement][system] IBVerbsCommNet add knobs #5626
  • [enhancement][system] Prune python tensor #5596
  • [feature][system][interface] Feat: LazyInterpret infer op / tensor ParallelDescScope #5625
  • [enhancement][system] Replace src tick with with wait and send ids #5603
  • [enhancement][system] Support symbol placement type in functional. #5627
  • [enhancement][system][api][refactor][interface] Dev advanced indexing #5559
  • [enhancement][system] Optimize maybe. #5839
  • [enhancement][system] Decorator 4 disable recursive boxing call #5796
  • [enhancement][system] add_eager_boxing_and_op_interpreter_dispatch_error_info #5819
  • [enhancement][system] Kernel CUDA Graphs Support #5725
  • [bug][system] Fix placement print bug #5853
  • [bug][system] when error msg formatting fails, return error->DebugString #5844
  • [enhancement][system][refactor] Rename variables named *parallel_distribution* to *nd_sbp* (1) #5815
  • [feature][system][interface] Support Free EagerTensor caught in nn.Graph build #5777
  • [enhancement][system] Reuse CUDA event / Refine BnInOp2Blob / Refine channel #5837
  • [enhancement][system][serving] fix bug in AddInputOutputOpsPass: check existence of key in HashMap(inferface_lbi2scope_sym_id) #5653
  • [enhancement][system][api] unpack_call: impl new unpack_call_dispatcher for better performance #5820
  • [feature][system] Feat consistent tensor python constructor #5812
  • [feature][system] Support 0shape tensor #5620
  • [documentation][system] fix launcher description #5770
  • [feature][system][interface] Multi-nn.Graph memory reuse by Chunk manager #5658
  • [bug][system] Fix naive b2p error #5806
  • [enhancement][system] set created generator with default rng seed #5801
  • [enhancement][system] enhance_local_to_consistent #5761
  • [feature][system] add flow.randn #5736
  • [enhancement][system] Refactor hierarchical parallel cast autograd #5764
  • [enhancement][system] Collective boxing executor add_plan delete_plan #5495
  • [enhancement][system] Fix throw abort #5795
  • [enhancement][system] DECORATE #5794
  • [enhancement][system] Inferface eager boxing #5682
  • [enhancement][system] extract_consistent_to_consistent_op_expr #5870
  • [enhancement][system] disable backward pass consistent tensor meta check. #5871
  • [enhancement][system] Add CudaStreamIndexGenerator::GenerateNamedStreamIndex #5940
  • [bug][system] Only query PCI bus id when CUDA version >= 11 #5937
  • [enhancement][system] maybe: add JUST_MSG and CHECK_JUST_MSG #5904
  • [bug][system] Fix bug scalar #5950
  • [enhancement][system] framework: fix rvalue reference warnings #5948
  • [purge][system] Remove CudaWorkType #5942
  • [enhancement][system] refactor_symbol #5941
  • [bug][system] consistent_tensor_infer_cache: fix memory leak #5938
  • [feature][system] support to print gpu #5936
  • [enhancement][system] Bugfix static check #5935
  • [bug][system] fix nccl_version log #5934
  • [bug][system] Fix bug of multi-GPU train nn.Graph extra mem cost in rank 0 #5930
  • [enhancement][system] Only gradient acc be scheduled in parallel. #5926
  • [enhancement][bug][system] fix_ddp_bug_on_8_process #5929
  • [enhancement][system] Fix bug error msg format #5866
  • [feature][system] print consistent tensor data #5902
  • [bug][system] Move parse env to the constructor #5922
  • [enhancement][system] Remove GlobalWorkStreamId/GlobalThrdId #5917
  • [bug][system] shared_or_scalar: fix alias warnings #5916
  • [purge][system] Remove CompActor #5919
  • [enhancement][system] Use symbol dtype #5641
  • [enhancement][feature][system] Control Graph / Session / Env's python c++ object destruction #5845
  • [enhancement][bug][system] Sync access and assign indexing tensor. #5907
  • [enhancement][system][api][refactor] Dev consistent arange #5883
  • [enhancement][system] Lazy interpreter for new ConsistentToConsistentOpExpr #5903
  • [bug][system] Fix BUG of LazyInterpret FreeEagerTensor memory shared with regst #5891
  • [bug][system] fix typo in raise RuntimeError #5890
  • [enhancement][system][refactor] Rename the ParallelDistribution class to NdSbp #5814
  • [feature][system] add flow.rand #5722
  • [feature][system] Lazy Interpret support infer default device cpu #5880
  • [enhancement][system] Tensor str #5783
  • [feature][system][interface] Lazy to_consistent #5774
  • [enhancement][system] wait vm empty before exiting #5860
  • [enhancement][system] Eager boxing n to 1 #5949
  • [enhancement][system] add kernel observer #6052
  • [enhancement][ci][system] Optimize ddp broadcast and add speed/memory test in ci #6044
  • [enhancement][system] add var to control only print warning once when blocked #6045
  • [enhancement][system][refactor] Rewrite pow and logical functional apis #6032
  • [enhancement][system] Token seq id #5964
  • [enhancement][documentation][system] Remove python function wrapper. #6012
  • [feature][system] Add timeout and loc for blocking calls #6007
  • [enhancement][system] Eager boxing 1 to n #5943
  • [enhancement][system] Boxing expr #6015
  • [enhancement][system] new_X_to_B #5987
  • [enhancement][system] Add unimplemented return information #5952
  • [enhancement][system] Revert "Faster decorator" #6006
  • [enhancement][system] Throw exception if using advanced indexing for tensor setitem #6001
  • [enhancement][system] Support eager boxing sm 2 sn #5869
  • [enhancement][system] Move framework/local_dep_object.* to the eager directory #5988
  • [enhancement][system] Fix builtin op arg tuple. #5464
  • [feature][system][refactor] Dev functional multiple signatures #5982
  • [enhancement][system] Faster decorator #5996
  • [enhancement][system] Placed nd sbp #5995
  • [feature][system] Support asymmetric input/output/variable tensors in nn.Graph #5983
  • [enhancement][system] LightActor #5868
  • [bug][system] Prevent running oneflow in forked subprocess #5976
  • [bug][system] common/error: fix build error in mac os #5971
  • [bug][system] fix_bug_test_tensor_str #5958
  • [enhancement][system] Refine StreamContext #6191
  • [enhancement][system] container_util: fix VectorAt, remove useless MutMapAt #6172
  • [enhancement][system] Typesafe KernelState #6198
  • [enhancement][system] Primitive based copy task node #6195
  • [feature][system][interface] Lazy support Scalar #6181
  • [enhancement][system] Disable implicit boxing when parallel num eq one #6188
  • [enhancement][system] Primitive #6183
  • [enhancement][system] Remove IDMgr::GetGpuPhyIdFromThrdId/IDMgr::GetDeviceTypeFromThrdId #6169
  • [enhancement][system] remove op_expr_helper inside gradient_funcs #6057
  • [feature][system][api] Add tensor yaml, support export tensor functional api. #6099
  • [feature][system] Plan memory log #6151
  • [feature][system] Add dtype bfloat16 #5304
  • [enhancement][system] StreamContext #6129
  • [bug][system] Fix wrong inplace acc grad #6146
  • [enhancement][system] UserKernel remove job_desc #6144
  • [enhancement][system][api] Fea/graph/add outputs buffer to enable pipeline #6126
  • [enhancement][system] not fuse request for nccl 2.10.3 #6136
  • [bug][system] NewUniqueId thread safe #6141
  • [enhancement][system] XRT remove job_desc #6139
  • [enhancement][system] SystemOpFillJobNamePass #6138
  • [enhancement][system] mv_boxing_folder_to_core #6140
  • [enhancement][system] Refactor boxing interpreter to boxing expr #6134
  • [enhancement][system] Eager boxing one to one #6048
  • [enhancement][system] Vm cpu efficiency #6110
  • [enhancement][system] Naive generic boxing #6116
  • [feature][system] send/recv #5992
  • [enhancement][system] disable_print_stack_in_tensor_numpy #6123
  • [feature][system] add all_reduce by to_consistent #5963
  • [enhancement][system] KernelContext #6084
  • [enhancement][bug][system] Fix sync nccl and async nccl deadlock #6071
  • [bug][system][refactor] Refactor to local #6098
  • [enhancement][system] Replace xor with hash combine (part 1) #6078
  • [enhancement][system] Optimize error message #6073
  • [enhancement][system] Rename Error::xx to Error::xxError #6049
  • [enhancement][system] send formatted msg to glog #5999
  • [feature][bottleneck][bug][system][interface] [Feat.] NNGraph new eager tensor for new variable created in JobPass #6091
  • [bug][system] Fix bug of multi-GPU eager copy D2H extra mem cost in rank 0 #6092
  • [enhancement][system][api] Rename module flow.F to flow._C #6053
  • [feature][system][interface] [Feat.] Eager consistent OFRecordReader #6089
  • [enhancement][system][api] Dev fix and align interface #6075
  • [feature][bottleneck][bug][system][interface] NNGraph input/output valid by register tensors #6240
  • [bug][system][interface] Fix bug of Multi-Client src tick output order #6221
  • [enhancement][bug][system] Add cast primitive #6234
  • [feature][bottleneck][system][interface] Auto FixPipelineStageIdPass #6204
  • [enhancement][system] move scalar to oneflow namespace. #6235
  • [enhancement][system] UserKernel init CUDA Graphs with state #6230
  • [feature][system] Comm broadcast #6213
  • [enhancement][system][refactor] Rename opname to optype_name in AutogradEngine #6154
  • [enhancement][system] Add memset primitive #6218
  • [enhancement][system] Add StreamContext::device_type()/DeviceCtx::device_type() #6217
  • [feature][system] add all_gather and fix bug of multi rank doctest #6189
  • [feature][system][interface] [Feat.] Lazy interpreter skip hierarchical_parallel_cast #6208
  • [purge][system] Cleanup KernelUtil #6212
  • [enhancement][system] StreamContextAdapter #6205
  • [enhancement][system] Dev eliminate gcc warnings #6199
  • [feature][bottleneck][system][interface] [Feat.] nn.Graph support grad acc with input/output tensor #6155
  • [enhancement][system] Cpu symetric s to s #6153
  • [enhancement][system][upload-core] Op expr infer tensor meta #5064
  • [enhancement][system] Infer consistent tensor meta #5362

CI enhancements:

  • [bug][ci][api][interface] Refine module test #5232
  • [enhancement][ci] Add Simple CI, runs CPU-only on GitHub hosted servers #5207
  • [enhancement][ci] Run exe test in CPU-only #5202
  • [enhancement][ci] Cancel all workflow runs but the latest #5206
  • [enhancement][ci] Fix master not running Simple CI #5368
  • [enhancement][ci] Refine Simple CI and Clang analysis #5367
  • [enhancement][feature][bug][ci][documentation][interface] Fix upsample bilinear bug #5363
  • [enhancement][ci] Build nightly for py39 #5318
  • [enhancement][ci] Try distributed run for 3 times to prevent failure #5305
  • [enhancement][ci] Upload Simple CI logs to cloud #5268
  • [enhancement][ci] Remove cpu_op_eager and cuda_op_eager #5470
  • [bug][ci] fix segfault in clang plugin #5437
  • [enhancement][ci] Refine Simple CI error output #5435
  • [enhancement][ci] Add conda env to Simple CI #5385
  • [enhancement][ci] Fix clang plugin core file not found #5390
  • [bug][ci] upload core when build with clang plugin #5384
  • [bug][ci] clang plugin skip more files #5373
  • [enhancement][ci] Use gh-action-scheduler-v2 #5370
  • [enhancement][ci] relax speed threshold #5569
  • [bug][ci] Fix wrong test path under compatible #5567
  • [enhancement][ci][need-simple-ci] Prevent upload logs automatically #5560
  • [enhancement][ci][interface] Add nn.AdaptiveAvgPool1d and nn.AdaptiveAvgPool3d #5445
  • [feature][ci] add speed test in ci #5496
  • [enhancement][ci] Reduce usage of Simple CI #5546
  • [feature][bug][ci][api] Restruct upsample module #5524
  • [feature][ci] multi client launcher test #5488
  • [enhancement][ci] Remove automerge if cuda_new_interface failed #5519
  • [enhancement][ci] Prevent adding subdir in python/test #5514
  • [enhancement][ci] piprepo->pipindex #5517
  • [enhancement][ci] add dynamic_loss_scale in ci tests #5337
  • [enhancement][ci] Add timeout for wait_gpu_slot #5497
  • [enhancement][feature][ci] new static check based on clang-tidy #5476
  • [enhancement][ci] Fix url not downloadable in some browers #5701
  • [feature][ci] multi client multi machine test #5685
  • [enhancement][ci] Add cpu new interface CI #5639
  • [enhancement][ci][need-simple-ci] Mv clangtidy to simple ci #5667
  • [enhancement][ci][need-simple-ci] use clang tidy appimage in ci #5841
  • [enhancement][ci] Use gcc 7 in release to prevent error #5840
  • [enhancement][ci] bn tol 1e-4 => 1e-3 #5811
  • [enhancement][ci] fix distributed run on built dir #5810
  • [enhancement][ci] fix third party mirror check_sum #5802
  • [ci][documentation] find more accurately which files need to be doctested #5782
  • [enhancement][ci] Print stack unconditionally #5779
  • [enhancement][ci][need-simple-ci] Enable more checkers for clang-tidy in CI #5738
  • [enhancement][ci] CI: add clang-tidy check to test.yaml #5920
  • [ci][documentation] fix docstring in oneflow.nn.functional namespace #5807
  • [enhancement][ci] disable TREAT_WARNINGS_AS_ERRORS in Release CI #5886
  • [enhancement][ci] Skip ci jobs by git diff #5863
  • [bug][ci] quick fix #5978 #6030
  • [enhancement][bug][ci] fix clang tidy diff options and file format #5990
  • [enhancement][ci] add flow.relu #5847
  • [enhancement][ci] equal => allclose #6164
  • [bug][ci][need-simple-ci] CI: fix clang tidy checks in simple ci #6161
  • [enhancement][bug][ci][documentation][api] add interpolate and layer_norm docs #6157
  • [bug][ci] update speed test #6113
  • [enhancement][bug][ci][documentation][api] speed import oneflow #6107
  • [bug][ci] Also try install dev deps for CODEGEN_PYTHON_EXECUTABLE #6115
  • [bug][ci][need-simple-ci] set gtest_CMAKE_DEBUG_POSTFIX "d" #6085
  • [enhancement][ci] add cache init file for clang and CI build with clang #6062
  • [enhancement][ci] add emoji in speed test output, make it continue-on-error #6214

Test enhancements:

  • [bug][test][interface] Fix acos ci bug #5217
  • [feature][test] implement automated test #5321
  • [enhancement][test] move generator test into ops folder to accelerate tests #5472
  • [feature][test][api] Add autotest part2 #5467
  • [enhancement][test][api][interface] Add some tests with the new framework for auto testing #5561
  • [bug][test] fix test error when do multi case test on graph #5590
  • [enhancement][test] Refine module test using auto test by yaochi #5484
  • [enhancement][test] Add autotest for BatchNorm2d #5734
  • [enhancement][test] RTH_update_op_test #5823
  • [enhancement][test] dev adamw graph config #5745
  • [feature][test][api][interface] Add new autotest #5562
  • [bug][test] restore test of alexnet graph #5798
  • [enhancement][test][interface] add zhangshen op-test #5600
  • [feature][bug][tooling][test][interface] Record autotest wrong code #5923
  • [enhancement][feature][test][api] add randint #5718
  • [bug][test] fix multi machine test #5984
  • [enhancement][test][interface] some op test #6095

Tooling enhancements:

  • [bug][tooling] user/summary: fix memory leak in FillImageInSummary #5742
  • [enhancement][tooling][cfg] cfg: add move assignment operator for performance #5962
  • [enhancement][tooling][api][refactor] refactor_all_device_placement_api #6080