-
Notifications
You must be signed in to change notification settings - Fork 762
Contributing to Heritrix
Find (or create) an issue in the JIRA issue tracker, and attach your fix in the form of a 'diff' patch.
Emailing the project list or individual developers to highlight changes also welcome. (This might let others know the patches are available to integrate on their own, or remind the core team of availability when prioritizing things for inclusion in an upcoming release.)
You may wish start a discussion about possible additions, or code you've created and wish to donate, on the project list, to gather feedback and find others interested.
You should also create a JIRA issue to describe and capture comments about your changes, and attach the code as a patch.
We don't yet have any formal contributor agreement, like the Apache Foundation Contributor Agreement. However, to ensure the project has the right to include your contributions in current and future releases under any open source license, you should include a clear statement with your contribution that you have the right to grant such permission, and do so in the broadest possible terms.
We recommend the following boilerplate, as adapted/excerpted from the Apache agreement, inserted into the JIRA issue of your first contribution under your name/login:
I hereby grant to the project and to recipients of software distributed by the project a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute my contributions and such derivative works.
I represent that I am legally entitled to grant the above license.
Unless required by applicable law or agreed to in writing, I provide my contributions on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.
As noted in the Apache case, this grant does not change your rights to use your own contributions for any other purpose – you retain all other rights.
Thus far, code committers to our master source repository at Sourceforge have all at one point been employees of the Internet Archive or other partner institutions with an organizational commitment to the project's development. However, any contributor who demonstrates adequate skill and judgement through participation in project discussions and submitted patches is eligible to become a committer. A formal process is not yet set up; contact the IA team and project lead(s) for more info.
If you are willing to do so, please add a note about your use of Heritrix to the Users of Heritrix page.
Structured Guides:
User Guide
- Introduction
- New Features in 3.0 and 3.1
- Your First Crawl
- Checkpointing
- Main Console Page
- Profiles
- Heritrix Output
- Common Heritrix Use Cases
- Jobs
- Configuring Jobs and Profiles
- Processing Chains
- Credentials
- Creating Jobs and Profiles
- Outside the User Interface
- A Quick Guide to Creating a Profile
- Job Page
- Frontier
- Spring Framework
- Multiple Machine Crawling
- Heritrix3 on Mac OS X
- Heritrix3 on Windows
- Responsible Crawling
- Adding URIs mid-crawl
- Politeness parameters
- BeanShell Script For Downloading Video
- crawl manifest
- JVM Options
- Frontier queue budgets
- BeanShell User Notes
- Facebook and Twitter Scroll-down
- Deduping (Duplication Reduction)
- Force speculative embed URIs into single queue.
- Heritrix3 Useful Scripts
- How-To Feed URLs in bulk to a crawler
- MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule
- WARC (Web ARChive)
- When taking a snapshot Heritrix renames crawl.log
- YouTube
- H3 Dev Notes for Crawl Operators
- Development Notes
- Spring Crawl Configuration
- Build Box
- Potential Cleanup-Refactorings
- Future Directions Brainstorming
- Documentation Wishlist
- Web Spam Detection for Heritrix
- Style Guide
- HOWTO Ship a Heritrix Release
- Heritrix in Eclipse