-
Notifications
You must be signed in to change notification settings - Fork 762
Preserve toString()
(We are actually really bad in this regard.)
Method toString has a special role in describing objects, including in expert/developer/debugging contexts. When overriding toString(), it should be to better describe the object, and should avoid making an object look like objects of other types. In particular, toString()'s form should be neither simplified nor extended in ways that other code is depending on parsing to deliver functionality.
In places (specifically around the UURI/CrawlURI classes) we've overridden Object.toString() to return a more 'naked' representation of a object, and then relied on that toString() for functionality.
Unfortunately, this hides useful info - like the class of anything that reports a plain URI string for toString().
If a object needs a lay user display string, a method for that specific purpose (eg 'toDisplayString') should be used. If an object needs a functionally important String representation, as say reduced to a format with its own logic and perhaps only interpretable with extra context, another specific method name should be used (eg 'toURIString' or 'toCustomString').
This retains toString() in its descriptive role, either in its default implementation or some other rich, debugging-centric rendering. This also means toString() can be extended fearlessly without risking application functionality. (In fact, that's a good test for any planned use of toString() – if the value returned changed arbitrarily, woudl any functionality break? If so, toString() is being misused.)
Structured Guides:
User Guide
- Introduction
- New Features in 3.0 and 3.1
- Your First Crawl
- Checkpointing
- Main Console Page
- Profiles
- Heritrix Output
- Common Heritrix Use Cases
- Jobs
- Configuring Jobs and Profiles
- Processing Chains
- Credentials
- Creating Jobs and Profiles
- Outside the User Interface
- A Quick Guide to Creating a Profile
- Job Page
- Frontier
- Spring Framework
- Multiple Machine Crawling
- Heritrix3 on Mac OS X
- Heritrix3 on Windows
- Responsible Crawling
- Adding URIs mid-crawl
- Politeness parameters
- BeanShell Script For Downloading Video
- crawl manifest
- JVM Options
- Frontier queue budgets
- BeanShell User Notes
- Facebook and Twitter Scroll-down
- Deduping (Duplication Reduction)
- Force speculative embed URIs into single queue.
- Heritrix3 Useful Scripts
- How-To Feed URLs in bulk to a crawler
- MatchesListRegexDecideRule vs NotMatchesListRegexDecideRule
- WARC (Web ARChive)
- When taking a snapshot Heritrix renames crawl.log
- YouTube
- H3 Dev Notes for Crawl Operators
- Development Notes
- Spring Crawl Configuration
- Build Box
- Potential Cleanup-Refactorings
- Future Directions Brainstorming
- Documentation Wishlist
- Web Spam Detection for Heritrix
- Style Guide
- HOWTO Ship a Heritrix Release
- Heritrix in Eclipse