-
Notifications
You must be signed in to change notification settings - Fork 7
/
README
143 lines (93 loc) · 4.51 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
==============================================================================
================== File-tests - File regressions detection ===================
==============================================================================
Author: Jan Kaluza <[email protected]>
1) WHY IS IT USEFUL?
====================
It's not possible to check the regressions between File releases manually and
unfortunatelly File can't work 100% and even a small change in one magic pattern
can cause a big regression.
This project aims to make comparing outputs of different File versions easier
and provides database of files in different formats.
2) HOW TO USE IT?
=================
This howto presumes that CWD is the root of the git repository (the same
directory as this README file is in).
2.1: Prepare the Magdir:
------------------------
You have to provide the Magdir directory with magic patterns from the File
sources of the exactly the same File version as you want to test. This is
needed to identify particular magic pattern responsible for the particular
file detection.
You can just copy Magdir into the CWD:
$ cp -pr /path/to/file/magic/Magdir ./
2.2: Update the database with the current/old File version:
-----------------------------------------------------------
Note that this whole procedure needs 3.5GB free space, because
patterns are splitted and compiled by file individually. See
HOW DOES IT WORK section for more details.
$ python update-db.py <unique-name> [path_to_magdir]
So in our case for example:
$ python update-db.py file-5.07 Magdir
2.3: Install new File version you want to test regressions against:
-------------------------------------------------------------------
$ sudo rpm -U my-file.prm my-file-libs.rpm ...
2.4: Run fast-regression-test:
------------------------------
Fast regression test tests the output of the currently installed File and
compares it to the one stored in db/*/*.json files. If there's a significant
difference (see HOW DOES IT WORK) between the output, it's showed as FAIL.
$ python fast-regression-test.py
If you run this test with "exact" argument, particular test will FAIL even
if there's small change in the File output.
2.5: Identify patterns causing the regressions:
-----------------------------------------------
With compare-db.py script, you can get the patterns causing the regression.
At first you have to prepare the Magdir from the current File version as
mentioned in step "2.1: Prepare the Magdir".
Note that this whole procedure needs 3.5GB free space, because
patterns are splitted and compiled by file individually. See
HOW DOES IT WORK section for more details.
$ python compare-db.py <unique-name> [path_to_magdir]
So in our case for example:
$ python compare-db.py file-5.10 Magdir
If you run this test with "exact" argument, particular test will FAIL even
if there's small change in the File output:
$ python compare-db.py file-5.10 Magdir exact
or
$ python compare-db.py file-5.10 exact
3) HOW DOES IT WORK?
====================
There is database of files in "db" directory.
3.1: update.db script:
----------------------
At first, this script splits all the patterns from Magdir directory into
separated files, so there is one magic pattern per file. Those files are
stored into ./.mgc_temp/<file_name_hash>/output.
After that, patterns are compiled using this algorithm:
N = 0
pattern_to_compile = ""
for every pattern:
pattern_to_compile += pattern
compile pattern_to_compile
save it as ./.mgc_temp/<file_name_hash>/.find-magic.tmp.N
N++
Then all files in the db directory are detected. During the detection,
particular pattern responsible for detection is found using the bisection
method.
All metadata are stored into ./db/*/*.json file.
3.2: fast-regression-test.py script:
------------------------------------
This script simply gets metadata stored in the ./db/*/*.json files and
compares them with the output of the current File version.
In non-exact mode, regressions are detected using the Ratcliff/Obershelp
pattern recognition implemented by difflib.SequenceMatcher Python module.
The regression is detected if the ratio returned by the SequenceMatcher is
lower than 0.7.
In exact mode, the old and new output have to match exactly.
3.3: compare-db.py script:
--------------------------
This script does the same work as update.db script, but instead of storing
metadata into ./db directory, it compares them with the metadata already
stored in ./db directory by previously run update-db.py script.
Comparison is done the same way as in fast-regression-test.py script.