forked from dib-lab/khmer
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
2046 lines (1451 loc) · 79 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2015-04-14 Josiah Seaman <[email protected]>
* lib/{hashbits.cc}: changed: adding doxygen comments
2015-04-14 Sarah Guermond <[email protected]>
* doc/dev/coding-guidelines-and-review.rst: added copyright question
to commit checklist.
2015-04-14 Andreas Härpfer <[email protected]>
* */*.py: Make docstrings PEP 257 compliant.
2015-04-13 Thomas Fenzl <[email protected]>
* lib/{khmer_exception.hh,{counting,hashbits,hashtable,subset}.cc}: changed
khmer_exception to use std::string to fix memory management.
2015-04-14 Michael R. Crusoe <[email protected]>
* khmer/_khmermodule.cc: catch more exceptions
* tests/test_{sandbox_scripts,subset_graph}.py: make tests more resiliant
2015-04-14 Michael R. Crusoe <[email protected]>
* lib/count.cc: Make CountingHash::abundance_distribution threadsafe
* khmer/_khmermodule.cc: remove newly unnecessary check for exception
* tests/test_scripts.py: added test to confirm the above
2015-04-14 Michael R. Crusoe <[email protected]>
* khmer/{__init__.py,_khmermodule.cc},lib/{counting,hashbits,hashtable,
subset}.cc: catch IO errors and report them.
* tests/test_hashbits.py: remove write to fixed path in /tmp
* tests/test_scripts.py: added test for empty counting table file
2015-04-13 Elmar Bucher <[email protected]>
* scripts/normalize-by-median.py (main): introduced warning for when at least
two input files are named the same.
2015-04-13 Andreas Härpfer <[email protected]>
* doc/dev/getting-started.rst: clarify Conda usage
2015-04-13 Daniel Standage <[email protected]>
* scripts/normalize-by-median.py: Added support to the diginorm script for
sending output to terminal (stdout) when using the conventional - as the output
filename. Also removed --append option.
* tests/test_scripts.py: Added functional test for diginorm stdout, removed
test of --append option.
2015-04-13 Scott Fay <[email protected]>
* scripts/filter-abund.py: added checking of input_table by
`check_file_status()`
2015-04-13 David Lin
* scripts/abundance-dist.py: disambiguate documentation for force and
squash options
2015-04-13 Michael R. Crusoe <[email protected]>
* README.rst,doc/index.rst: added link to gitter.im chat room
* doc/README.rst: removed ancient, outdated, and unused file
2015-04-13 Thomas Fenzl <[email protected]>
* khmer/_khmermodule.cc: removed unused find_all_tags_truncate_on_abundance
from python api
2015-04-10 Will Trimble
* tests/test_script_arguments.py: added a test to check for the empty file
warning when checking if a file exists
2015-04-10 Jacob Fenton <[email protected]>
* scripts/test-{scripts.py}: added test for check_file_writable using
load_into_counting
2015-04-10 Phillip Garland <[email protected]>
* khmer/file.py (check_file_writable): new function to check writability
* scripts/load-into-counting.py (main): early check to see if output is
writable
2015-04-07 Michael R. Crusoe <[email protected]>
* README.rst: add a ReadTheDocs badge
2015-04-06 Michael R. Crusoe <[email protected]>
* jenkins-build.sh: updated OS X warning flag to quiet the build a bit
2015-04-06 Michael R. Crusoe <[email protected]>
* Makefile: added 'convert-release-notes' target for MD->RST conversion
* doc/{,release-notes}/index.rst: include release notes in documentation
* doc/release-notes/*.rst: added pandoc converted versions of release notes
* jenkins-build.sh: use the Sphinx method to install doc dependencies
2015-04-05 Michael R. Crusoe <[email protected]>
* setup.py: use the release version of screed 0.8
2015-04-05 Michael R. Crusoe <[email protected]>
* doc/*/*.txt: all documentation sources have been renamed to use the rst
extension to indicate that they are reStructuredText files. This enables
use of rich text editors on GitHub and elsewhere.
* doc/conf.py: update Sphinx configuration to reflect this change
* doc/requirements.txt: added hint to install version 3.4.1 of Setuptools;
this file is used by ReadTheDocs only.
2015-04-05 Michael R. Crusoe <[email protected]>
* ChangeLog, lib/read_aligner.cc, sandbox/sweep-reads.py: fixed spelling
errors.
2015-04-05 Kevin Murray <[email protected]>
* lib/read_parsers.{cc,hh}: Work around an issue (#884) in SeqAn 1.4.x
handling of truncated sequence files. Also revamp exceptions
* khmer/_khmermodule.cc: Use new/updated exceptions handling malformed
FASTA/Q files.
* tests/test_read_parsers.py: add a test of parsing of truncated fastq
files
2015-04-03 Luiz Irber <[email protected]>
* lib/hllcounter.cc: Use for loop instead of transform on merge method,
now works on C++11.
2015-04-01 Luiz Irber <[email protected]>
* third-party/smhasher/MurmurHash3.{cc,h}: remove unused code, fix warnings.
2015-04-01 Michael R. Crusoe <[email protected]>
* Doxyfile.in: make documentation generation reproducible, removed timestamp
2015-04-01 Alex Hyer <[email protected]>
* scripts/find-knots.py: added force argument to check_file_status()
call in main().
2015-03-31 Kevin Murray <[email protected]>
* lib/read_parsers.{cc,hh}: add read counting to IParser and subclasses
* khmer/_khmermodule.cc,tests/test_read_parsers.py: add 'num_reads'
attribute to khmer.ReadParser objects in python land, and test it.
2015-03-28 Kevin Murray <[email protected]>
* lib/hashbits.hh: Add Hashbits::n_tables() accessor
2015-03-27 Michael R. Crusoe <[email protected]>
* lib/read_parsers.{cc,hh}: Obfuscate SeqAn SequenceStream objects with a
wrapper struct, to avoid #include-ing the SeqAn headers.
* lib/Makefile: Don't install the SeqAn headers.
2015-03-27 Kevin Murray <[email protected]>
* lib/Makefile: Add libkhmer targets, clean up
* lib/get_version.py: Rewrite to use versioneer.py
* lib/.gitignore,third-party/.gitignore: Add more compiled outputs
* lib/.check_openmp.cc: add source that checks compiler for openmp support.
* lib/khmer.pc.in: add pkg-config file for khmer
2015-03-23 Kevin Murray <[email protected]>
* lib/counting.hh: Add CountingHash::n_tables() accessor
2015-03-16 Jessica Mizzi <[email protected]>
* khmer/kfile.py: Added file not existing error for system exit
* tests/{test_scripts,test_functions}.py: Added tests for
check_file_status for file existence and force option
2015-03-15 Kevin Murray <[email protected]> & Titus Brown <[email protected]>
* tests/test_counting_hash.py: Skip get_raw_tables test if python doesn't
have the memoryview type/function.
2015-03-11 Erich Schwarz <[email protected]>
* Added URLs and brief descriptions for khmer-relevant documentation in
doc/introduction.txt, pointing to http://khmer-protocols.readthedocs.org and
khmer-recipes.readthedocs.org, with brief descriptions of their content.
2015-03-10 Camille Scott <[email protected]>
* lib/counting.hh, khmer/_khmermodule.cc: Expose the raw tables of
count-min sketches to the world of python using a buffer interface.
* tests/test_counting_hash.py: Tests of the above functionality.
2015-03-08 Michael R. Crusoe <[email protected]>
* Makefile: make 'pep8' target be more verbose
* jenkins-build.sh: specify setuptools version
* scripts/{abundance-dist,annotate-partitions,count-median,do-partition,
extract-paired-reads,extract-partitions,filter-stoptags,find-knots,
interleave-reads,merge-partitions,partition-graph,sample-reads-randomly,
split-paired-reads}.py,setup.py: fix new PEP8 errors
* setup.py: specify that this is a Python 2 only project (for now)
* tests/test_{counting_single,subset_graph}.py: make explicit the use of
floor division behavior.
2015-03-06 Titus Brown <[email protected]>
* sandbox/{collect-reads.py,saturate-by-median.py}: update for 'force'
argument in khmer.kfile functions, so that khmer-recipes compile.
2015-03-02 Titus Brown <[email protected]>
* sandbox/{combine-pe.py,compare-partitions.py,count-within-radius.py,
degree-by-position.py,dn-identify-errors.py,ec.py,error-correct-pass2.py,
find-unpart.py,normalize-by-align.py,read-aligner.py,shuffle-fasta.py,
to-casava-1.8-fastq.py,uniqify-sequences.py}: removed from sandbox/ as
obsolete/unmaintained.
* sandbox/README.rst: updated to reflect readstats.py and trim-low-abund.py
promotion to sandbox/.
* doc/dev/scripts-and-sandbox.txt: updated to reflect sandbox/ script name
preferences, and note to remove from README.rst when moved over to scripts/.
2015-02-27 Kevin Murray <[email protected]>
* scripts/load-into-counting.py: Be verbose in the help text, to clarify
what the -b flag does.
2015-02-25 Hussien Alameldin <[email protected]>
* sandbox/bloom_count.py: renamed to bloom-count.py
* sandbox/bloom_count_intersection.py: renamed to
bloom-count-intersection.py
* sandbox/read_aligner.py: renamed to read-aligner.py
2015-02-26 Tamer A. Mansour <[email protected]>
* scripts/abundance-dist-single.py: Use CSV format for the histogram.
* scripts/count-overlap.py: Use CSV format for the curve file output.
Includes column headers.
* scripts/abundance-dist-single.py: Use CSV format for the histogram.
Includes column headers.
* tests/test_scripts.py: add test functions for the --csv option in
abundance-dist-single.py and count-overlap.py
2015-02-26 Jacob Fenton <[email protected]>
* doc/introduction.txt, doc/user/choosing-table-sizes.txt: Updated docs to
ref correct links and names
2015-02-25 Aditi Gupta <[email protected]>
* sandbox/{collect-reads.py, correct-errors.py,
normalize-by-median-pct.py, slice-reads-by-coverage.py,
sweep-files.py, sweep-reads3.py, to-casava-1.8-fastq.py}:
Replaced 'accuracy' with 'quality'. Fixes #787.
2015-02-25 Tamer A. Mansour <[email protected]>
* scripts/normalize-by-median.py: change to the default behavior to
overwrite the sequences output file. Also add a new argument --append to
append new reads to the output file.
* tests/test_scripts.py: add a test for the --append option in
normalize-by-median.py
2015-02-25 Hussien Alameldin <[email protected]>
* khmer/khmer_args.py: add 'hll' citation entry "Irber and Brown,
unpublished." to _alg. dict.
* sandbox/unique-kmers.py: add call to 'info' with 'hll' in the
algorithms list.
2015-02-24 Luiz Irber <[email protected]>
* khmer/_khmermodule.cc: expose HLL internals as read-only attributes.
* lib/hllcounter.{cc,hh}: simplify error checking, add getters for HLL.
* tests/test_hll.py: add test cases for increasing coverage, also fix
some of the previous ones using the new HLL read-only attributes.
2015-02-24 Luiz Irber <[email protected]>
* khmer/_khmermodule.cc: Fix coding style violations.
2015-02-24 Luiz Irber <[email protected]>
* khmer/_khmermodule.cc: Update extension to use recommended practices,
PyLong instead of PyInt, Type initialization, PyBytes instead of PyString.
Replace common initialization with explicit type structs, and all types
conform to the CPython checklist.
2015-02-24 Tamer A. Mansour <[email protected]>
* scripts/abundance-dist.py: Use CSV format for the histogram. Includes
column headers.
* tests/test_scripts.py: add coverage for the new --csv option in
abundance-dist.py
2015-02-24 Michael R. Crusoe <[email protected]>
* jenkins-build.sh: remove examples/stamps/do.sh testing for now; takes too
long to run on every build. Related to #836
2015-02-24 Kevin Murray <[email protected]>
* scripts/interleave-reads.py: Make the output file name print nicely.
2015-02-23 Titus Brown <[email protected]>
* khmer/utils.py: added 'check_is_left' and 'check_is_right' functions;
fixed bug in check_is_pair.
* tests/test_functions.py: added tests for now-fixed bug in check_is_pair,
as well as 'check_is_left' and 'check_is_right'.
* scripts/interleave-reads.py: updated to handle Casava 1.8 formatting.
* scripts/split-paired-reads.py: fixed bug where sequences with bad names
got dropped; updated to properly handle Casava 1.8 names in FASTQ files.
* scripts/count-median.py: added '--csv' output format; updated to properly
handle Casava 1.8 FASTQ format when '--csv' is specified.
* scripts/normalize-by-median.py: replaced pair checking with
utils.check_is_pair(), which properly handles Casava 1.8 FASTQ format.
* tests/test_scripts.py: updated script tests to check Casava 1.8
formatting; fixed extract-long-sequences.py test.
* scripts/{extract-long-sequences.py,extract-paired-reads.py,
fastq-to-fasta.py,readstats.py,sample-reads-randomly.py,trim-low-abund.py},
khmer/thread_utils.py: updated to handle Casava 1.8 FASTQ format by
setting parse_description=False in screed.open(...).
* tests/test-data/{paired-mixed.fq,paired-mixed.fq.pe,random-20-a.fq,
test-abund-read-2.fq,test-abund-read-2.paired2.fq,test-abund-read-paired.fa,
test-abund-read-paired.fq}: switched some sequences over to Casava 1.8
format, to test format handling.
* tests/test-data/{casava_18-pe.fq,test-reads.fq.gz}: new test file for
Casava 1.8 format handling.
* tests/test-data/{overlap.curve,paired-mixed.fq.1,paired-mixed.fq.2,
simple_1.fa,simple_2.fa,simple_3.fa,test-colors.fa,test-est.fa,
test-graph3.fa,test-graph4.fa,test-graph6.fa}: removed no-longer used
test files.
2015-02-23 Titus Brown <[email protected]>
* setup.cfg: set !linux flag by default, to avoid running tests that
request too much memory when 'nosetests' is run. (This is an OS difference
where Mac OS X attempts to allocate as much memory as requested, while
on Linux it just crashes).
2015-02-23 Michael R. Crusoe <[email protected]>
* khmer/{__init__.py,_khmermodule.cc},lib/{hashbits.cc,hashbits.hh,
hashtable,tests/test_{c_wrapper,read_parsers}.py: remove unused callback
functionality
2015-02-23 Michael R. Crusoe <[email protected]>
* setup.py: point to the latest screed release candidate to work around
versioneer bug.
2015-02-23 Tamer A. Mansour <[email protected]>
* examples/stamps/do.sh: the argument --savehash was changed to --savetable
and change mode to u+x
* jenkins-build.sh: add a test to check for the do.sh file
2015-02-23 Kevin Murray <[email protected]>
* khmer/load_pe.py: Remove unused/undocumented module. See #784
2015-02-21 Hussien Alameldin <[email protected]>
* sandbox/normalize-by-align.py: "copyright header 2013-2015 was added"
* sandbob/read_aligner.py: "copyright header 2013-2015 was added"
* sandbox/slice-reads-by-coverage.py: "copyright header 2014 was added"
2015-02-21 Hussien Alameldin <[email protected]>
* sandbox/calc-best-assembly.py, collect-variants.py, graph-size.py: Set executable bits using "chmod +x"
2015-02-21 Michael R. Crusoe <[email protected]>
* khmer/_khmermodule.cc,lib/read_parsers.cc: Rename the 'accuracy' attribute
of ReadParser Reads to 'quality'
* tests/test_read_parsers.py: update test to match
2015-02-21 Rhys Kidd <[email protected]>
* sandbox/{calc-best-assembly,calc-error-profile,normalize-by-align,
read_aligner,slice-reads-by-coverage}.py: reference /usr/bin/env python2
in the #! line.
2015-02-21 Rhys Kidd <[email protected]>
* sandbox/sweep-paired-reads.py: remove empty script
2015-02-20 Titus Brown <[email protected]>
* doc/dev/scripts-and-sandbox.txt: policies for sandbox/ and scripts/
content, and a process for adding new command line scripts into scripts/.
* doc/dev/index.txt: added scripts-and-sandbox to developer doc index.
2015-02-20 Michael R. Crusoe <[email protected]>
* khmer/_khmermodule.cc: convert C++ out of memory exceptions to Python
out of memory exception.
* test/test_{counting_hash,counting_single,hashbits_obj,labelhash,
scripts}.py: partial tests for the above
2015-02-20 Aditi Gupta <[email protected]>
* doc/dev/coding-guidelines-and-review.txt: fixed spelling errors.
2015-02-19 Michael R. Crusoe <[email protected]>
* doc/dev/coding-guidelines-and-review.txt: added checklist for new CPython
types
* khmer/_khmermodule.cc: Update ReadAligner to follow the new guidelines
2015-02-19 Daniel Standage <[email protected]>
* Makefile: add a new Makefile target `help` to list and describe all
common targets.
* khmer/utils.py, tests/test_functions.py: minor style fixes.
2015-02-16 Titus Brown <[email protected]>
* khmer/utils.py: added 'check_is_pair', 'broken_paired_reader', and
'write_record_pair' functions.
* khmer/khmer_args.py: added streaming reference for future algorithms
citation.
* tests/test_functions.py: added unit tests for 'check_is_pair' and
'broken_paired_reader'.
* scripts/trim-low-abund.py: upgraded to track pairs properly; added
proper get_parser information; moved to scripts/ from sandbox/.
* tests/test_scripts.py: added paired-read tests for
trim-low-abund.py.
* tests/test-data/test-abund-read-2.paired.fq: data for paired-read tests.
* scripts/extract-paired-reads.py: removed 'is_pair' in favor of
'check_is_pair'; switched to using 'broken_paired_reader'; fixed use
of sys.argv.
* scripts/sample-reads-randomly.py: removed unused 'output_single' function.
* doc/user/scripts.txt: added trim-low-abund.py.
2015-02-13 Qingpeng Zhang <[email protected]>
* scripts/sample-reads-randomly.py: fix a glitch about string formatting.
2015-02-11 Titus Brown <[email protected]>
* khmer/_khmermodule.cc: fixed k-mer size checking; updated some error
messages.
* tests/test_graph.py: added test for k-mer size checking in find_all_tags.
2015-02-09 Titus Brown <[email protected]>
* scripts/split-paired-reads.py: added -1 and -2 options to allow fine-
grain specification of output locations; switch to using write_record
instead of script-specific output functionality.
* tests/test_scripts.py: added accompanying tests.
2015-02-09 Bede Constantinides <[email protected]>
* scripts/split-paired-reads.py: added -o option to allow specification
of an output directory
* tests/test_scripts.py: added accompanying test for split-paired-reads.py
2015-02-01 Titus Brown <[email protected]>
* khmer/_khmermodule.cc: added functions hash_find_all_tags_list and
hash_get_tags_and_positions to CountingHash objects.
* tests/test_counting_hash.py: added tests for new functionality.
2015-01-25 Titus Brown <[email protected]>
* sandbox/correct-errors.py: fixed sequence output so that quality
scores length always matches the sequence length; fixed argparse
setup to make use of default parameter.
2015-01-25 Titus Brown <[email protected]>
* sandbox/readstats.py: fixed non-functional string interpolation at end;
added -o to send output to a file; moved to scripts/.
* doc/user/scripts.txt: added readstats description.
* tests/test_scripts.py: added tests for readstats.py
2015-01-23 Jessica Mizzi <[email protected]>
* khmer/utils.py: Added single write_record fuction to write FASTA/Q
* scripts/{abundance-dist,extract-long-sequences,extract-partitions,
interleave-reads,normalize-by-median,sample-reads-randomly}.py:
Replaced FASTA/Q writing method with write_record
2015-01-23 Michael R. Crusoe <[email protected]>
* Makefile: remove the user installs for the `install-dependencies` target
2015-01-23 Michael R. Crusoe <[email protected]>
* README.rst,doc/user/install.txt: clarify that we support Python 2.7.x
and not Python 3.
2015-01-21 Luiz Irber <[email protected]>
* lib/hllcounter.{cc,hh}: Implemented a HyperLogLog counter.
* khmer/{_khmermodule.cc, __init__.py}: added HLLCounter class
initialization and wrapper.
* tests/test_hll.py: added test functions for the new
HyperLogLog counter.
* sandbox/unique-kmers.py: implemented a CLI script for
approximate cardinality estimation using a HyperLogLog counter.
* setup.cfg, Makefile, third-party/smhasher/MurmurHash3.{cc,h},
lib/kmer_hash.{cc,hh}, setup.py: added MurmurHash3 hash function
and configuration.
* setup.py: added a function to check if compiler supports OpenMP.
2015-01-14 Reed Cartwright <[email protected]>
* doc/dev/getting-started.txt: Added install information for
Arch Linux
2014-01-14 Michael R. Crusoe <[email protected]>
* doc/user/{blog-posts,guide}.txt,examples/stamps/do.sh,sandbox/{
collect-reads,error-correct-pass2,filter-median-and-pct,filter-median,
read_aligner,split-sequences-by-length}.py,scripts/{filter-abund,
load-into-counting}.py,tests/test_{counting_hash,hashbits,scripts}.py:
remove references to ".kh" files replaces with ".pt" or ".ct" as
appropriate
* tests/test-data/{bad-versionk12,normC20k20}.kh: renamed to "*.ct"
2015-01-13 Daniel Standage <[email protected]>
* tests/khmer_tst_utils.py, tests/test_sandbox_scripts.py: removed
unused module imports
* .gitignore: added pylint_report.txt so that it is not accidentally
committed after running make diff_pylint_report
* khmer/file.py -> khmer/kfile.py: renamed internal file handling
class to avoid collisions with builtin Python file module
* sandbox/collect-reads.py, sanbox/saturate-by-median.py,
sandbox/sweep-files.py, sandbox/sweep-reads.py,
scripts/abundance-dist-single.py, scripts/abundance-dist.py,
scripts/annotate-partitions.py, scripts/count-median.py,
scripts/count-overlap.py, scripts/do-partition.py,
scripts/extract-long-sequences.py, scripts/extract-paired-reads.py,
scripts/extract-partitions.py, scripts/filter-abund-single.py,
scripts/filter-abund.py, scripts/filter-stoptags.py,
scripts/find-knots.py, scripts/interleave-reads.py,
scripts/load-graph.py, scripts/load-into-counting.py,
scripts/make-initial-stoptags.py, scripts/merge-partitions.py,
scripts/normalize-by-median.py, scripts/partition-graph.py,
scripts/sample-reads-randomly.py, scripts/split-paired-reads.py,
tests/test_script_arguments.py, tests/test_scripts.py: changed all
occurrences of `file` to `kfile`
2015-01-09 Rhys Kidd <[email protected]>
* lib/khmer.hh: implement generic NONCOPYABLE() macro guard
* lib/hashtable.hh: apply NONCOPYABLE macro guard in case of future
modifications to Hashtable that might exposure potential memory corruption
with default copy constructor
2014-12-30 Michael Wright <[email protected]>
* tests/test_scripts.py: Attained complete testing coverage for
scripts/filter_abund.py
2014-12-30 Brian Wyss <[email protected]>
* tests/test_scripts.py: added four new tests:
load_into_counting_multifile(), test_abundance_dist_single_nosquash(),
test_abundance_dist_single_savehash, test_filter_abund_2_singlefile
2015-12-29 Michael R. Crusoe <[email protected]>
* CITATION,khmer/khmer_args.py,scripts/{abundance-dist-single,
filter-abund-single,load-graph,load-into-counting}.py: Give credit to the
SeqAn project for their FASTQ/FASTA reader that we use.
2014-12-26 Titus Brown <[email protected]>
* tests/tests_sandbox_scripts.py: added import and execfile test for all
sandbox/ scripts.
* sandbox/{abundance-hist-by-position.py,
sandbox/assembly-diff-2.py, sandbox/assembly-diff.py,
sandbox/bloom_count.py, sandbox/bloom_count_intersection.py,
sandbox/build-sparse-graph.py, sandbox/combine-pe.py,
sandbox/compare-partitions.py, sandbox/count-within-radius.py,
sandbox/degree-by-position.py, sandbox/ec.py,
sandbox/error-correct-pass2.py, sandbox/extract-single-partition.py,
sandbox/fasta-to-abundance-hist.py, sandbox/filter-median-and-pct.py,
sandbox/filter-median.py, sandbox/find-high-abund-kmers.py,
sandbox/find-unpart.py, sandbox/graph-size.py,
sandbox/hi-lo-abundance-by-position.py, sandbox/multi-rename.py,
sandbox/normalize-by-median-pct.py, sandbox/print-stoptags.py,
sandbox/print-tagset.py, sandbox/readstats.py,
sandbox/renumber-partitions.py, sandbox/shuffle-fasta.py,
sandbox/shuffle-reverse-rotary.py, sandbox/split-fasta.py,
sandbox/split-sequences-by-length.py, sandbox/stoptag-abundance-hist.py,
sandbox/stoptags-by-position.py, sandbox/strip-partition.py,
sandbox/subset-report.py, sandbox/sweep-out-reads-with-contigs.py,
sandbox/sweep-reads2.py, sandbox/sweep-reads3.py,
sandbox/uniqify-sequences.py, sandbox/write-interleave.py}: cleaned up
to make 'import'-able and 'execfile'-able.
2014-12-26 Michael R. Crusoe <[email protected]>
* tests/test_functions.py: Generate a temporary filename instead of
writing to the current directory
* Makefile: always run the `test` target if specified
2014-12-20 Titus Brown <[email protected]>
* sandbox/slice-reads-by-coverage.py: fixed 'N' behavior to match other
scripts ('N's are now replaced by 'A', not 'G').
* sandbox/trim-low-abund.py: corrected reporting bug (bp written);
simplified second-pass logic a bit; expanded reporting.
2014-12-17 Jessica Mizzi <[email protected]>
* khmer/file.py,sandbox/sweep-reads.py,scripts/{abundance-dist-single,
abundance-dist,annotate-partitions,count-median,count-overlap,do-partition,
extract-paired-reads,extract-partitions,filter-abund-single,filter-abund,
filter-stoptags,interleave-reads,load-graph,load-into-counting,
make-initial-stoptags,merge-partitions,normalize-by-median,partition-graph,
sample-reads-randomly,split-paired-reads}.py,setup.cfg,
tests/{test_script_arguments,test_scripts}.py: Added force option to all
scripts to script IO sanity checks and updated tests to match.
2014-12-17 Michael R. Crusoe <[email protected]>
* setup.cfg,tests/test_{counting_hash,counting_single,filter,graph,
hashbits,hashbits_obj,labelhash,lump,read_parsers,scripts,subset_graph}.py:
reduce memory usage of tests to about 100 megabytes max.
2014-12-17 Michael R. Crusoe <[email protected]>
* scripts/load-graph.py,khmer/_khmermodule.cc: restore threading to
load-graph.py
2014-12-16 Titus Brown <[email protected]>
* sandbox/{calc-error-profile.py,collect-variants.py,correct-errors.py,
trim-low-abund.py}: Support for k-mer spectral error analysis, sublinear
error profile calculations from shotgun data sets, adaptive variant
collection based on graphalign, streaming error correction, and streaming
error trimming.
* tests/test_sandbox_scripts.py: added tests for sandbox/trim-low-abund.py.
* tests/test_counting_hash.py: added tests for new
CountingHash::find_spectral_error_positions function.
2014-12-16 Michael R. Crusoe <[email protected]> & Camille Scott
* khmer/_khmermodule.cc: fixed memory leak in the ReadParser paired
iterator (not used by any scripts).
* lib/read_parsers.cc,khmer/_khmermodule.cc: Improved exception handling.
* tests/test_read_parsers.py,
tests/test-data/100-reads.fq.truncated.{bz2,gz}: Added tests for truncated
compressed files accessed via ReadParser paired and unpaired iterators.
2014-12-09 Michael R. Crusoe <[email protected]>
New FAST[AQ] parser (from the SeqAn project). Fixes known issue and a
newly found read dropping issue
https://github.com/ged-lab/khmer/issues/249
https://github.com/ged-lab/khmer/pull/641
Supports reading from non-seekable plain and gziped FAST[AQ] files (a.k.a
pipe or streaming support)
* khmer/{__init__.py,_khmermodule.cc}: removed the Config object, the
threads argument to new_counting_hash, and adapted to other changes in API.
Dropped the unused _dump_report_fn method. Enhanced error reporting.
* lib/{bittest,consume_prof,error,khmer_config,scoringmatrix,thread_id_map}
.{cc,hh},tests/test_khmer_config.py: deleted unused files
* sandbox/collect-reads.py,scripts/{abundance-dist-single,do-partition,
filter-abund-single,load-into-counting}.py: adapted to Python API changes:
no threads argument to ReadParser, no more config
* tests/test_{counting_hash,counting_single,hashbits,hashbits_obj,
test_read_parsers}.py: updated tests to new error pattern (upon object
creation, not first access) and the same API change as above. Thanks to
Camille for her enhanced multi-thread test.
* lib/{counting,hashtable,ht-diff}.cc,khmer.hh: renamed MAX_COUNT define to
MAX_KCOUNT; avoids naming conflict with SeqAn
* khmer/file.py: check_file_status(): ignored input files named '-'
* khmer/khmer_tst_utils.py: added method to pipe input files to a target
script
* tests/test_scripts.py: enhanced streaming tests now that four of them
work.
* Makefile: refreshed cppcheck{,-result.xml} targets, added develop
setuptools command prior to testing
2014-12-08 Michael R. Crusoe <[email protected]>
* doc/user/known_issues.txt: Document that multithreading leads to dropped
reads.
2014-12-07 Michael R. Crusoe <[email protected]>
This is khmer v1.2
* Makefile: add sandbox scripts to the pylint_report.txt target
* doc/dev/coding-guidelines-and-review.txt: Add question about command
line API to the checklist
* doc/dev/release.txt: refresh release procedure
* doc/release-notes/release-1.2.md
2014-12-05 Michael R. Crusoe <[email protected]>
* CITATIONS,khmer/khmer_args.py: update citations for Qingpeng's paper
2014-12-01 Michael R. Crusoe <[email protected]>
* doc/roadmap.txt: Explain the roadmap to v2 through v4
2014-12-01 Kevin Murray <[email protected]>
* tests/test_scripts.py: Stop a test from making a temporary output file
in the current dir by explicitly specifying an output file.
2014-12-01 Kevin Murray <[email protected]>
* load-into-counting.py: Add a CLI parameter to output a machine-readable
summary of the run, including number of k-mers, FPR, input files etc in
json or TSV format.
2014-12-01 Titus Brown <[email protected]>
* Update sandbox docs: some scripts now used in recipes
2014-11-23 Phillip Garland <[email protected]>
* lib/khmer.hh (khmer): define KSIZE_MAX
* khmer/_khmermodule.cc (forward_hash, forward_hash_no_rc) (reverse_hash):
Use KSIZE_MAX to check whether the user-supplied k is larger than khmer
supports.
2014-11-19 Michael R. Crusoe <[email protected]>
* CODE_OF_CONDUT.RST,doc/dev/{index,CODE_OF_CONDUCT}.txt: added a code of
conduct
2014-11-18 Jonathan Gluck <[email protected]>
* tests/test_counting_hash.py: Fixed copy paste error in comments, True to
False.
2014-11-15 Jacob Fenton <[email protected]>
* tests/test_scripts.py: added screed/read_parsers stream testing
* khmer/file.py: modified file size checker to not break when fed
a fifo/block device
* tests/test-data/test-abund-read-2.fa.{bz2, gz}: new test files
2014-11-11 Jacob Fenton <[email protected]>
* do-partition.py: replaced threading args in scripts with things from
khmer_args
* khmer/theading_args.py: removed as it has been deprecated
2014-11-06 Michael R. Crusoe <[email protected]>
* lib/{counting,hashbits}.{cc,hh},lib/hashtable.hh: Moved the n_kmers()
function into the parent Hashtable class as n_unique_kmers(), adding it to
CountingHash along the way. Removed the unused start and stop parameters.
* khmer/_khmermodule.cc: Added Python wrapping for CountingHash::
n_unique_kmers(); adapted to the dropped start and stop parameters.
* scripts/{load-graph,load-into-counting,normalize-by-median}.py: used the
n_unique_kmers() function instead of the n_occupied() function to get the
number of unique kmers in a table.
* tests/test_{hashbits,hashbits_obj,labelhash,scripts}.py: updated the
tests to reflect the above
2014-10-24 Camille Scott <[email protected]>
* do-partition.py: Add type=int to n_threads arg and assert to check
number of active threads
2014-10-10 Brian Wyss <[email protected]>
* khmer/scripts/{abundance-dist, abundance-dist-single,
annotate-partitions, count-median, count-overlap, do-partition,
extract-paired-reads, extract-partitions, filter-abund, filter-abund-single,
filter-stoptags, find-knots, load-graph, load-into-counting,
make-initial-stoptags, merge-partitions, normalize-by-median,
partition-graph, sample-reads-randomly}.py:
changed stdout output in scripts to go to stderr.
2014-10-06 Michael R. Crusoe <[email protected]>
* Doxyfile.in: add links to the stdc++ docs
2014-10-01 Ben Taylor <[email protected]>
* khmer/_khmermodule.cc, lib/hashtable.cc, lib/hashtable.hh,
tests/test_counting_hash.py, tests/test_labelhash.py,
tests/test_hashbits.py, tests/test_hashbits_obj.py:
Removed Hashtable::consume_high_abund_kmers,
Hashtable::count_kmers_within_depth, Hashtable::find_radius_for_volume,
Hashtable::count_kmers_on_radius
2014-09-29 Michael R. Crusoe <[email protected]>
* versioneer.py: upgrade versioneer 0.11->0.12
2014-09-29 Sherine Awad <[email protected]>
* scripts/normalize-by-median.py: catch expections generated by wrong
indentation for 'total'
2014-09-23 Jacob G. Fenton <[email protected]>
* scripts/{abundance-dist-single, abundance-dist, count-median,
count-overlap, extract-paired-reads, filter-abund-single,
load-graph, load-into-counting, make-initial-stoptags,
partition-graph, split-paired-reads}.py:
added output file listing at end of file
* scripts/extract-long-sequences.py: refactored to set write_out to
sys.stdout by default; added output location listing.
* scripts/{fastq-to-fasta, interleave-reads}.py:
added output file listing sensitive to optional -o argument
* tests/test_scripts.py: added test for scripts/make-initial-stoptags.py
2014-09-19 Ben Taylor <[email protected]>
* Makefile: added --inline-suppr to cppcheck, cppcheck-result.xml targets
* khmer/_khmermodule.cc: Added comments to address cppcheck false positives
* lib/hashtable.cc, lib/hashtable.hh: take args to filter_if_present by
reference, address scope in destructor
* lib/read_parsers.cc: Added comments to address cppcheck false positives
* lib/subset.cc, lib/subset.hh: Adjusted output_partitioned_file,
find_unpart to take args by reference, fix assign_partition_id to use
.empty() instead of .size()
2014-09-19 Ben Taylor <[email protected]>
* Makefile: Add astyle, format targets
* doc/dev/coding-guidelines-and-review.txt: Add reference to `make format`
target
2014-09-10 Titus Brown <[email protected]>
* sandbox/calc-median-distribution.py: catch exceptions generated by reads
shorter than k in length.
* sandbox/collect-reads.py: added script to collect reads until specific
average cutoff.
* sandbox/slice-reads-by-coverage.py: added script to extract reads with
a specific coverage slice (based on median k-mer abundance).
2014-09-09 Titus Brown <[email protected]>
* Added sandbox/README.rst to describe/reference removed files,
and document remaining sandbox files.
* Removed many obsolete sandbox files, including:
sandbox/abund-ablate-reads.py,
sandbox/annotate-with-median-count.py,
sandbox/assemble-individual-partitions.py,
sandbox/assemstats.py,
sandbox/assemstats2.py,
sandbox/bench-graphsize-orig.py,
sandbox/bench-graphsize-th.py,
sandbox/bin-reads-by-abundance.py,
sandbox/bowtie-parser.py,
sandbox/calc-degree.py,
sandbox/calc-kmer-partition-counts.py,
sandbox/calc-kmer-read-abunds.py,
sandbox/calc-kmer-read-stats.py,
sandbox/calc-kmer-to-partition-ratio.py,
sandbox/calc-sequence-entropy.py,
sandbox/choose-largest-assembly.py,
sandbox/consume-and-traverse.py,
sandbox/contig-coverage.py,
sandbox/count-circum-by-position.py,
sandbox/count-density-by-position.py,
sandbox/count-distance-to-volume.py,
sandbox/count-median-abund-by-partition.py,
sandbox/count-shared-kmers-btw-assemblies.py,
sandbox/ctb-iterative-bench-2-old.py,
sandbox/ctb-iterative-bench.py,
sandbox/discard-high-abund.py,
sandbox/discard-pre-high-abund.py,
sandbox/do-intertable-part.py,
sandbox/do-partition-2.py,
sandbox/do-partition-stop.py,
sandbox/do-partition.py,
sandbox/do-subset-merge.py,
sandbox/do-th-subset-calc.py,
sandbox/do-th-subset-load.py,
sandbox/do-th-subset-save.py,
sandbox/extract-surrender.py,
sandbox/extract-with-median-count.py,
sandbox/fasta-to-fastq.py,
sandbox/filter-above-median.py,
sandbox/filter-abund-output-by-length.py,
sandbox/filter-area.py,
sandbox/filter-degree.py,
sandbox/filter-density-explosion.py,
sandbox/filter-if-present.py,
sandbox/filter-max255.py,
sandbox/filter-min2-multi.py,
sandbox/filter-sodd.py,
sandbox/filter-subsets-by-partsize.py,
sandbox/get-occupancy.py,
sandbox/get-occupancy2.py,
sandbox/graph-partition-separate.py,
sandbox/graph-size-circum-trim.py,
sandbox/graph-size-degree-trim.py,
sandbox/graph-size-py.py,
sandbox/join_pe.py,
sandbox/keep-stoptags.py,
sandbox/label-pairs.py,
sandbox/length-dist.py,
sandbox/load-ht-and-tags.py,
sandbox/make-coverage-by-position-for-node.py,
sandbox/make-coverage-histogram.py,
sandbox/make-coverage.py,
sandbox/make-random.py,
sandbox/make-read-stats.py,
sandbox/multi-abyss.py,
sandbox/multi-stats.py,
sandbox/multi-velvet.py,
sandbox/normalize-by-min.py,
sandbox/occupy.py,
sandbox/parse-bowtie-pe.py,
sandbox/parse-stats.py,
sandbox/partition-by-contig.py,
sandbox/partition-by-contig2.py,
sandbox/partition-size-dist-running.py,
sandbox/partition-size-dist.py,
sandbox/path-compare-to-vectors.py,
sandbox/print-exact-abund-kmer.py,
sandbox/print-high-density-kmers.py,
sandbox/quality-trim-pe.py,
sandbox/quality-trim.py,
sandbox/reformat.py,
sandbox/remove-N.py,
sandbox/softmask-high-abund.py,
sandbox/split-N.py,
sandbox/split-fasta-on-circum.py,
sandbox/split-fasta-on-circum2.py,
sandbox/split-fasta-on-circum3.py,
sandbox/split-fasta-on-circum4.py,
sandbox/split-fasta-on-degree-th.py,
sandbox/split-fasta-on-degree.py,
sandbox/split-fasta-on-density.py,
sandbox/split-reads-on-median-diff.py,
sandbox/summarize.py,
sandbox/sweep_perf.py,
sandbox/test_scripts.py,
sandbox/traverse-contigs.py,
sandbox/traverse-from-reads.py,
sandbox/validate-partitioning.py -- removed as obsolete.
2014-09-01 Michael R. Crusoe <[email protected]>
* doc/dev/coding-guidelines-and-review.txt: Clarify pull request checklist
* CONTRIBUTING.md: update URL to new dev docs
2014-08-30 Rhys Kidd <[email protected]>
* khmer/_khmermodule.cc: fix table.get("wrong_length_string") gives core
dump
* lib/kmer_hash.cc: improve quality of exception error message
* tests/{test_counting_hash,test_counting_single,test_hashbits,
test_hashbits_obj}.py: add regression unit tests
2014-08-28 Titus Brown <[email protected]>
* scripts/normalize-by-median.py: added reporting output after main loop
exits, in case it hadn't been triggered.
* sandbox/saturate-by-median.py: added flag to change reporting frequency,
cleaned up leftover code from when it was copied from
normalize-by-median.
2014-08-24 Rhys Kidd <[email protected]>
* khmer/thread_utils.py, sandbox/filter-below-abund.py,
scripts/{extract-long-sequences,load-graph,load-into-counting,
normalize-by-median,split-paired-reads}.py,
scripts/galaxy/gedlab.py: fix minor PyLint issues
2014-08-20 Michael R. Crusoe <[email protected]>
* test/test_version.py: add Python2.6 compatibility.
2014-08-20 Rhys Kidd <[email protected]>
* setup.py,README.rst,doc/user/install.txt: Test requirement for a
64-bit operating system, documentation changes. Fixes #529
2014-08-19 Michael R. Crusoe <[email protected]>
* {setup,versioneer,khmer/_version}.py: upgrade versioneer from 0.10 to 0.11
2014-08-18 Michael R. Crusoe <[email protected]>
* setup.py: Use the system bz2 and/or zlib libraries if specified in
setup.cfg or overridden on the commandline