File start/end offset issue for VB file #647

D3v3sh5ingh · 2023-11-30T05:29:08Z

Issue : 643

File_start_offset and File_end_offset options for VB files are not working and throwing the same error as posted in issue 643.
I have a file with both RDW and BDW (Record Format VB) . The file is with header and footer also.
I want to skip first few bytes of header and last few bytes of footer.
For that using options file_start_offset and file_end_offset but getting the similar error as in issue 643.

yruslan · 2023-11-30T07:37:01Z

Hi @D3v3sh5ingh, what's your high level offset layout?

For example:
0 - 19 Headers (to be ignored)
20 - 23 BDW
24 - 27 RDW
28 - 99 Payload
100 - 193 RDW
...
32000 Payload
32093 Footer (to be ignored)

D3v3sh5ingh · 2023-11-30T09:05:36Z

Hi @yruslan
My high level layout looks like below:
BDW { RDW 45 bytes , RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}
BDW { RDW 1000 bytes .....}
......
BDW { RDW 1000 bytes...., RDW 45 bytes}

45 bytes of header and trailer are inside the BDW as shown above.
We want to remove these 45 bytes of header and trailer present in the file.

yruslan · 2023-11-30T09:39:02Z

file_start_offset and file_end_offset work on the level of file, e.g. cases like:
HEDAER {45 bytes} BDW { RDW 1000 bytes, RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}

Since your 45 headers are part of record payload you can't do it using these options. What you can do is you can add the header as a redefine segment in your copybook, and then you can filter it out after you get the dataframe.

The copybook will looks like this:

01   RECORD.
   05  HEDAER.
        10 CONTENT X(45).
   05 PAYLOAD REDEFINES HEADER.
   ... your payload goes at level 10 here

D3v3sh5ingh · 2023-11-30T18:59:56Z

Hi ,
This is a sample output for my file . 45 bytes that i want to skip are at the start and at the end only . Not in each record.
If I don't use the file _start_offset and file_end_offset , i am able to get above dataframe as output but I am getting two extra records(Header and Trailer).
But if I use these options with 45 bytes , i face an error ( length of BDW block is too big ) .

yruslan · 2023-12-01T12:03:28Z

Options 'file_start_offset' and 'file_end_offset' only drop bytes from the beginning or at the end of files, not from the payload. This is the expected behavior.

There are no options that allow dropping bytes from inside records, so possible solutions are:

If you need to keep these special 45-byte records, you can use the modified copybook solution above.
(probably your case) If you want to ignore these special 45-byte records, just remove these records in post-processing, e.g. df.filter(col("COL1").isNotNull)

D3v3sh5ingh added the bug Something isn't working label Nov 30, 2023

yruslan added question Further information is requested and removed bug Something isn't working labels Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File start/end offset issue for VB file #647

File start/end offset issue for VB file #647

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Nov 30, 2023

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Nov 30, 2023 •

edited

Loading

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Dec 1, 2023

File start/end offset issue for VB file #647

File start/end offset issue for VB file #647

Comments

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Nov 30, 2023

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Nov 30, 2023 • edited Loading

D3v3sh5ingh commented Nov 30, 2023

yruslan commented Dec 1, 2023

yruslan commented Nov 30, 2023 •

edited

Loading