Move core `nn` functions to RETURNN #252

albertz · 2023-02-13T12:28:21Z

As a result of rwth-i6/returnn#1120 and rwth-i6/returnn#1264, we want to move core functions over to RETURNN, as this should become the common compatibility layer for different backends such as TensorFlow and PyTorch.

These are mostly the functions in nn/_generated_layers.py, but this in turn also means to move over some of the other code logic, like the naming logic, etc.

The text was updated successfully, but these errors were encountered:

albertz · 2023-02-25T23:43:49Z

On RETURNN side, the new frontend API is supposed to cover exactly those core nn functions. It already started in development, although it is quite minimal so far, and mostly targets TF low-level and PyTorch low-level.

Moving nn core functions to RETURNN means to use the TF high-level layer-based frontend for Tensor and to implement those frontend API functions, in returnn/tf/frontend_layers.

There is already the compare function with a small inconsistency, which does not break too much code though, so this should be fine.

This also means to rewrite nn.Tensor, to now use the RETURNN Tensor class.

The RETURNN Tensor class would also be extended by all the math ops like __add__, which just call the frontend API, like we have it currently in nn.Tensor.

#252

albertz · 2023-02-26T22:49:53Z

About replacing nn.Tensor: We need to go through all the current attribs and methods, check for their usages, and make sure the same thing works with RETURNN Tensor. One most obvious question: What would be the raw tensor type? nn.NameCtx? It seems like NameCtx lacks a few things which is in nn.Tensor currently, such as:

Does it handle the subclass PrevTensorRef right?

Can we move those things over to NameCtx? Let's just try to do that step by step and see.

We need to be careful about cases where tensor.name_ctx is reassigned. This can happen via Tensor._replace_by and NameCtx.move_layer_ref_here.

Tensor._replace_by usage is only in prepare_for_config_serialization, only this code:

                if root_mod_call.layer_ref is not None:
                    assert not self.layer_ref  # not sure. maybe just reset?
                    assert root_mod_call.layer.layer_dict["class"] == "subnetwork"
                    sub_out = root_mod_call.children.pop("output")
                    assert sub_out.layer.layer_dict["class"] == "copy"
                    sub_real_out = sub_out.layer.layer_dict["from"]
                    assert isinstance(sub_real_out, nn.Tensor)
                    # noinspection PyProtectedMember
                    sub_out.layer._replace_by(sub_real_out)
                    # noinspection PyProtectedMember
                    root_mod_call.layer._replace_by(sub_real_out)

Tensor._assign_parent_name_ctx is called in
NameCtx._remove_unused_and_assign_parents_and_handle_subnets. _assign_parent_name_ctx can probably be moved to NameCtx.

NameCtx.move_layer_ref_here is only in PrevTensorRef.assign_new_cur_layer_name_ctx:

    def assign_new_cur_layer_name_ctx(self, cur_layer_name_ctx: nn.NameCtx):
        """
        Changes self.name_ctx to new name_ctx.
        """
        prev_layer_name = f"prev:{cur_layer_name_ctx.name}"
        assert prev_layer_name not in cur_layer_name_ctx.parent.children
        prev_layer_name_ctx = cur_layer_name_ctx.parent.get_child(prev_layer_name)
        prev_layer_name_ctx.move_layer_ref_here(self)
        assert self.name_ctx is prev_layer_name_ctx
        self.cur_layer_name_ctx = cur_layer_name_ctx

And this is only called via Loop.assign()._map_ref_to_name_ctx():

                    prev_ref.assign_new_cur_layer_name_ctx(layer_ref.name_ctx)

Preparation for #252

This is for preparation of RETURNN Tensor usage. #252

#252

albertz · 2023-03-08T22:00:25Z

feature_dim / batch_dim. If possible remove? Why do we need them?

feature_dim is used quite often. The code might be maybe too verbose when we don't have this? There are also not really any ambiguous cases about this, I think. But I'm usure. Edit For now, we might just introduce feature_dim as an alias, and delay this. However, this also means we need feature_dim_axis. Edit See rwth-i6/returnn#1273 for a more in-depth discussion on feature_dim.

~~batch_dim only has a single use in stochastic_depth and this probably should be refactored. So we can remove batch_dim.~~ (Done)

There is also nn.batch_dim, which might be removed as well. Currently there is some usage but not too much. Mostly it is for defining extern data, e.g. via nn.get_extern_data.

Then there is nn.single_step_dim. It might have some valid use cases.

We want to remove Tensor.batch_dim. #252

#252

rwth-i6/returnn_common#252

albertz · 2023-03-10T13:23:05Z

__copy__ / __deepcopy__, no-op for Tensor

This is somewhat unclear. There is a comment pointing to: #215 (comment)

In there, the argument was, Tensor (for returnn-common) is immutable, thus __copy__/__deepcopy__ can be a no-op. However, I'm not sure if this was a must be.

For graph-based frameworks (RETURNN, TensorFlow graph mode), it's true that a tensor is immutable. copy(x) on a TF tensor will also be a no-op and return the same object. But this is not necessarily true for eager-mode frameworks (PyTorch, Numpy). A copy of a tensor (copy(x)) will create a new instance with really separate memory allocation (at least as presented to the user, maybe it uses COW internally). TF eager mode is an exception. Tensors are also immutable in eager mode, and copy(x) is a no-op even in eager mode.

Wrt. our frontend API, we also have tensors immutable. There is no function which would operate inplace on it. Parameter will be an exception though. And immutable is just meant conceptually and referring to raw_tensor. In practice, the Tensor object is mutable, allows to reassign attribs, etc. As it is used widely, this probably cannot be changed anymore.

Also, for any developer writing code with Tensor and the frontend API, is it maybe unexpected if copy(x) does not return a copy? If it is really truely immutable, this should not matter at all. But as this is not the case, it might be dangerous.

Maybe a Tensor can also have a flag to be immutable, and if this flag is enabled, it would really disallow any modifications? (Btw, I wonder a bit how TF does it internally in tf.Tensor. Usually I would expect that you still can get access to internal objects, and modify then. However, I tried a bit, and I'm really not able to modify any internal data of a tf.Tensor, e.g. x._id.)

Or should copy() return a copy of our Tensor object? But what about the raw_tensor then? Also copy that (just using copy(raw_tensor), whatever that returns)? Again, the copy logic of Parameter can be different.

#252 (comment)

rwth-i6/returnn_common#252

#252

Fix #1273. This is also to prepare for rwth-i6/returnn_common#252.

#1120 (comment) rwth-i6/returnn_common#252

rwth-i6/returnn_common#252

#252

albertz · 2023-03-16T21:05:49Z

Now rwth-i6/returnn#1277 was merged, which simplifies the code move.

albertz · 2023-03-23T13:55:55Z

A bit on the planned process here: We would copy over bit by bit to RETURNN and integrate it into the RETURNN net dict backend of the RETURNN frontend (rwth-i6/returnn#1120). We would not copy all at once. We also would not touch existing code here in RC for now.

I'm not sure if there is a good way here for RC to gradually use the moved code.

There are also some changes:

Module does not have parents and _calls anymore. (Frontend API and PyTorch backend returnn#1120 (comment)) It should be possible to infer layer names also without that, but this needs some new code. This might result in different net dicts, but this should not matter anyway. There should be no visible differences to the user.

In general, as we did not finalize the RC nn API yet, changes are ok, even to the public API.

albertz · 2023-03-27T12:23:38Z

I will probably rename NameCtx to Layer, at least for the code on RETURNN side.

#1120 Copied from RETURNN-common. rwth-i6/returnn_common#252 On namespace (rf.encoder.conformer.Conformer), see: #1120 (comment) This is currently untested.

albertz · 2023-04-19T10:10:38Z

A bit on the process here: This is work-in-progress now. We already moved a lot over. But step-by-step, function-by-function, module-by-module. But even ConformerEncoder was moved over now (rwth-i6/returnn#1120 (comment)). Next things are Transformer, beam search (but needs a new API), AED/Transducer models, SpecAugment. And not sure what is really missing then. Maybe then this can be considered as finished.

albertz mentioned this issue Feb 13, 2023

Frontend API and PyTorch backend rwth-i6/returnn#1120

Open

albertz mentioned this issue Feb 24, 2023

Tensor abstraction rwth-i6/returnn#1261

Merged

albertz added a commit that referenced this issue Feb 26, 2023

prepare for Tensor merge

94292c2

#252

albertz added a commit that referenced this issue Feb 26, 2023

prepare for Tensor merge

9b3095f

#252

albertz added a commit that referenced this issue Feb 27, 2023

rename Tensor raw_tensor

7c01174

Preparation for #252

albertz added a commit that referenced this issue Feb 27, 2023

move Tensor layer_dict to NameCtx

ed5462b

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

cleanup Tensor is_ref

3fd39b0

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

move Tensor debug_layer to NameCtx

dd89f94

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

move Tensor extra_dependencies to NameCtx

d7a84f6

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

move Tensor parent_modules to NameCtx

886fddb

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

move Tensor remove_unused_cleanup_hooks to NameCtx

fa4d2bc

This is for preparation of RETURNN Tensor usage. #252

albertz added a commit that referenced this issue Feb 28, 2023

Tensor get_dependencies prepare to be moved

cc3cdf2

#252

albertz added a commit that referenced this issue Feb 28, 2023

cleanup PrevTensorRef get_dependencies

30a2432

#252

albertz added a commit that referenced this issue Feb 28, 2023

move Tensor get_dependencies to NameCtx

0ca388f

#252

albertz added a commit that referenced this issue Mar 10, 2023

stochastic_depth more generic, avoid batch_dim

681e058

We want to remove Tensor.batch_dim. #252

albertz added a commit that referenced this issue Mar 10, 2023

remove Tensor.batch_dim

c1cf7cf

#252

albertz added a commit that referenced this issue Mar 10, 2023

cleanup, remove Tensor._get_name_in_ctx

e9cc699

#252

albertz added a commit that referenced this issue Mar 10, 2023

cleanup, remove Tensor.get_abs_name

76f7458

#252

albertz added a commit that referenced this issue Mar 10, 2023

move Tensor._sis_hash logic

5d2d056

#252

albertz added a commit to rwth-i6/returnn that referenced this issue Mar 10, 2023

implement Tensor._sis_hash

af1548a

rwth-i6/returnn_common#252

albertz added a commit that referenced this issue Mar 10, 2023

remove Tensor __copy__ and __deepcopy__

8f4149a

#252 (comment)

albertz mentioned this issue Mar 10, 2023

remove Tensor __copy__ and __deepcopy__ #253

Merged

albertz added a commit that referenced this issue Mar 10, 2023

remove Tensor __copy__ and __deepcopy__ (#253)

ce19c45

#252 (comment)

albertz added a commit to rwth-i6/returnn that referenced this issue Mar 10, 2023

implement Tensor.remaining_dims

124a6d2

rwth-i6/returnn_common#252

albertz added a commit that referenced this issue Mar 10, 2023

some cleanup of Tensor.feature_dim

80395b3

#252

albertz mentioned this issue Mar 10, 2023

Tensor.feature_dim and Tensor.feature_dim_axis rwth-i6/returnn#1273

Closed

albertz added a commit that referenced this issue Mar 11, 2023

remove Tensor._replace_by

d5861cb

#252

albertz mentioned this issue Mar 11, 2023

remove Tensor._replace_by #254

Merged

albertz added a commit that referenced this issue Mar 11, 2023

remove Tensor._replace_by (#254)

5190413

#252

albertz mentioned this issue Mar 14, 2023

allow Tensor feature_dim_axis, add feature_dim rwth-i6/returnn#1275

Merged

albertz added a commit to rwth-i6/returnn that referenced this issue Mar 14, 2023

allow Tensor feature_dim_axis, add feature_dim (#1275)

5c8c97a

Fix #1273. This is also to prepare for rwth-i6/returnn_common#252.

albertz added a commit to rwth-i6/returnn that referenced this issue Mar 16, 2023

mark_as_loss, mark_as_output stubs

5ae7113

#1120 (comment) rwth-i6/returnn_common#252

albertz added a commit to rwth-i6/returnn that referenced this issue Mar 16, 2023

Tensor dims_set

dd4bd3c

rwth-i6/returnn_common#252

albertz mentioned this issue Mar 16, 2023

Tensor eq rwth-i6/returnn#1276

Merged

albertz added a commit that referenced this issue Mar 16, 2023

move Tensor require_global_access to NameCtx

2064a32

#252

albertz mentioned this issue Mar 16, 2023

Better integration of returnn_common.nn rwth-i6/returnn#1185

Closed

albertz mentioned this issue Mar 16, 2023

Make returnn.frontend a proper Python package rwth-i6/returnn#1277

Merged

albertz added a commit to rwth-i6/returnn that referenced this issue Apr 18, 2023

RF Conformer

2ec964a

#1120 Copied from RETURNN-common. rwth-i6/returnn_common#252 On namespace (rf.encoder.conformer.Conformer), see: #1120 (comment) This is currently untested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move core `nn` functions to RETURNN #252

Move core `nn` functions to RETURNN #252

albertz commented Feb 13, 2023

albertz commented Feb 25, 2023

albertz commented Feb 26, 2023 •

edited

Loading

albertz commented Mar 8, 2023 •

edited

Loading

albertz commented Mar 10, 2023

albertz commented Mar 16, 2023 •

edited

Loading

albertz commented Mar 23, 2023

albertz commented Mar 27, 2023

albertz commented Apr 19, 2023

Move core nn functions to RETURNN #252

Move core nn functions to RETURNN #252

Comments

albertz commented Feb 13, 2023

albertz commented Feb 25, 2023

albertz commented Feb 26, 2023 • edited Loading

albertz commented Mar 8, 2023 • edited Loading

albertz commented Mar 10, 2023

albertz commented Mar 16, 2023 • edited Loading

albertz commented Mar 23, 2023

albertz commented Mar 27, 2023

albertz commented Apr 19, 2023

Move core `nn` functions to RETURNN #252

Move core `nn` functions to RETURNN #252

albertz commented Feb 26, 2023 •

edited

Loading

albertz commented Mar 8, 2023 •

edited

Loading

albertz commented Mar 16, 2023 •

edited

Loading