Commit Graph

23 Commits (master)

Author SHA1 Message Date
vi 00d2341394 Tighten some loops, change more interface.
This will save.. five comparisons.. occasionally? But it's a more
transparent design, and it seems more charitable to group interface
changes together.
2017-12-11 05:52:40 +01:00
vi c8fa18c5f4 rewrite(URL|Cookie) are now pure.
They have been pure since ed1933f, but now the types reflect that.

Breaking interface change.
2017-12-11 05:13:59 +01:00
vi ed1933f2c5 Remove the IO bottleneck.
Don't redundantly readIO/parse the rulesets; do this once, lazily
carrying out the involved operations. Rulesets are invariant over
executions.

This improves performance by a few orders of magnitude. Though at some
point we should substitute linear search for lookup on a generalised
suffix tree of rooted domains.

Breaking interface change, though I'll likely restore the old form
soon with IO TH.
2017-12-06 13:21:57 +01:00
vi 4708e7fc8c Avoid the regular expression engine in parsing rule targets.
I read in the bible (https://www.eff.org/https-everywhere/rulesets) that:

"""
To cover all of a domain's subdomains, you may want to specify a
wildcard target like *.twitter.com. Specifying this type of left-side
wildcard matches any host name with .twitter.com as a suffix, e.g.
www.twitter.com or urls.api.twitter.com. You can also specify a
right-side wildcard like www.google.*. Right-side wildcards, unlike
left-side wildcards, apply only one level deep. So if you want to
cover all countries you'll generally need to specify www.google.*,
www.google.co.*, and www.google.com.* to cover domains like
www.google.co.uk or www.google.com.au.
"""

The previous interpretation is both incorrect (because right wildcards
only apply one level deep) and potentially expensive (regular
expression matching is exponential in the worst-case.)
2017-12-05 20:54:40 +01:00
vi 9fcf5bc289 Refactored ICU extras. 2015-11-08 17:16:05 +08:00
vi c26afe01cf Don't admit package-wise parameterisation of rulesets.
As 'a6f28e07a1edc8f62f3dfaf7965b3a818c2f4a7f' showed, there may be
breaking changes in the structure of rulesets between releases. I
don't intend to verify that every pair in the product works (is there
reason to be interested in any other than the latest?), so let's not
acommodate any more than one.
2015-11-08 00:53:35 +08:00
vi e3c171b67e Prepare to assimilate https-everywhere-rules-raw. 2015-11-04 21:57:59 +08:00
vi 6d12745fc9 Interface change for consistency: rewriteURL is idempotent on addresses with no matching rules. 2014-08-24 23:57:55 +08:00
vi 19a9a6b40d Fixed Cookie parser -- fields and predicates were mismatched. 2014-08-24 14:00:27 +08:00
vi 5af781291f Correct pipeline semantics. 2014-08-24 12:57:35 +08:00
vi c6c5eae311 Simplified adornSuffix; no "Maybe" indirection. 2014-08-24 01:30:40 +08:00
vi 42bc20ae07 Safe implementation of adornSuffix. 2014-08-23 19:01:14 +08:00
vi 9bf2a9194c Fixed exclusion matching. 2014-08-17 08:39:56 +08:00
vi 85605fcbab Fixed target parser. 2014-08-11 05:42:03 +08:00
vi 8ec3419492 Failing tests for parseRuleSets. 2014-08-11 05:32:06 +08:00
vi 49be7aa1a0 Use more structured URI representation; targets match only hosts.
This resolves #2.
2014-08-11 03:32:51 +08:00
vi 444b5ea51d Don't strip the text surrounding a match when performing find and replace.
Meta:
  Cross-Reference: #2
2014-08-11 02:09:02 +08:00
vi 809cca61b7 A unit test for the target parser. 2014-08-11 00:38:16 +08:00
vi d86284f09e Unit tests for Data.Text.ICU.Extras. 2014-08-11 00:12:36 +08:00
vi 2c0a18a6f0 Simplified replacement parsing for less redundancy. 2014-08-10 23:35:12 +08:00
vi 5060883c5a Broken domain logic. 2014-08-10 07:16:57 +08:00
vi e154ef7404 Escape "." characters in target; this resolves #1. 2014-08-10 04:47:52 +08:00
vi f78421e09f Incomplete parser for HTTPS Everywhere rulesets. 2014-08-10 04:23:41 +08:00