Don't fail when Redis is misbehaving, e.g. OOM command not allowed when used memory > 'maxmemory' #511

Nowaker · 2020-12-20T21:14:00Z

Rack Attack is written so it doesn't fail if Redis is down on application startup. That's cool.

However, it will bring down the application if Redis server starts misbehaving at later point, for example:

Redis::CommandError: OOM command not allowed when used memory > 'maxmemory'

Preferably, we should ignore any Redis errors and pass the request to the next middleware.

Alternatively, we should allow users to define an exception handler for any errors that happen inside Rack Attack (not the next middleware), with an option to bypass the exception and continue processing. Example:

Rack::Attack.exception_handler = lambda do |exception|
  Rails.logger.error exception # do whatever you want here
  raise # Re-raises the exception, therefore failing the request. Without an explicit `raise` in the handler, the processing would continue
end

Here's the full stacktrace from when the problem happened and killed my application:

lib/redis/client.rb:121 call  
lib/redis.rb:769 block in setex 
lib/redis.rb:58 block in synchronize  
/home/ec2-user/.rbenv/versions/2.6.3/lib/ruby/2.6.0/monitor.rb:230 mon_synchronize  
lib/redis.rb:58 synchronize 
lib/redis.rb:768 setex  
redis-store (1.6.0) lib/redis/store/interface.rb:17 setex 
redis-store (1.6.0) lib/redis/store/serialization.rb:13 block in setex  
redis-store (1.6.0) lib/redis/store/serialization.rb:40 _marshal  
redis-store (1.6.0) lib/redis/store/serialization.rb:13 setex 
redis-store (1.6.0) lib/redis/store/namespace.rb:11 block in setex  
redis-store (1.6.0) lib/redis/store/namespace.rb:89 namespace 
redis-store (1.6.0) lib/redis/store/namespace.rb:11 setex 
redis-store (1.6.0) lib/redis/store/ttl.rb:6 set  
redis-store (1.6.0) lib/redis/store/serialization.rb:5 block in set 
redis-store (1.6.0) lib/redis/store/serialization.rb:40 _marshal  
redis-store (1.6.0) lib/redis/store/serialization.rb:5 set  
redis-store (1.6.0) lib/redis/store/namespace.rb:7 block in set 
redis-store (1.6.0) lib/redis/store/namespace.rb:89 namespace 
redis-store (1.6.0) lib/redis/store/namespace.rb:7 set  
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:282 block (2 levels) in write_entry 
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:270 with  
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:282 block in write_entry  
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:340 failsafe  
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:280 write_entry 
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:86 block in write 
lib/active_support/cache.rb:547 block in instrument 
lib/appsignal/hooks/active_support_notifications.rb:21 block in instrument  
lib/active_support/notifications/instrumenter.rb:20 instrument  
lib/appsignal/hooks/active_support_notifications.rb:35 instrument 
lib/appsignal/hooks/active_support_notifications.rb:20 instrument 
lib/active_support/cache.rb:547 instrument  
redis-activesupport (5.2.0) lib/active_support/cache/redis_store.rb:81 write  
/home/ec2-user/.rbenv/versions/2.6.3/lib/ruby/2.6.0/delegate.rb:83 method_missing 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/store_proxy/active_support_redis_store_proxy.rb:34 write 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/store_proxy/active_support_redis_store_proxy.rb:21 increment 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/cache.rb:74 do_count 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/cache.rb:27 count  
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/throttle.rb:31 matched_by? 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/configuration.rb:93 block in throttled?  
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/configuration.rb:92 any? 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack/configuration.rb:92 throttled? 
/var/www/apps/app/shared/bundle/ruby/2.6.0/bundler/gems/rack-attack-6d1bc9b617c5/lib/rack/attack.rb:118 call
(... other middlewares ...)

The text was updated successfully, but these errors were encountered:

Nowaker · 2020-12-20T21:47:01Z

Here's how I monkey-patched it to achieve the desired result:

# Monkey-patch of https://github.com/rack/rack-attack/blob/master/lib/rack/attack.rb#L102
# Resolves https://github.com/rack/rack-attack/issues/511
class Rack::Attack
  def safely default_return_value = false
    yield
  rescue Exception => e
    Rails.logger.warn "Rack Attack error bypassed: #{e.message}. #{e.backtrace&[0]}"
    Appsignal.send_error e
    default_return_value
  end

  def call(env)
    return @app.call(env) if !self.class.enabled || env["rack.attack.called"]

    env["rack.attack.called"] = true
    env['PATH_INFO'] = PathNormalizer.normalize_path(env['PATH_INFO'])
    request = Rack::Attack::Request.new(env)

    if safely { configuration.safelisted?(request) }
      @app.call(env)
    elsif safely { configuration.blocklisted?(request) }
      # Deprecated: Keeping blocklisted_response for backwards compatibility
      if configuration.blocklisted_response
        configuration.blocklisted_response.call(env)
      else
        configuration.blocklisted_callback.call(request)
      end
    elsif safely { configuration.throttled?(request) }
      # Deprecated: Keeping throttled_response for backwards compatibility
      if configuration.throttled_response
        configuration.throttled_response.call(env)
      else
        configuration.throttled_callback.call(request)
      end
    else
      safely { configuration.tracked?(request) }
      @app.call(env)
    end
  end
end

jeremyhaile · 2021-06-30T14:23:56Z

We recently had a problem where our servers were being overloaded. Most of our requests failed in rack-attack waiting on redis responses. The redis server itself seemed ok, so I think we were just out of network bandwidth on the servers (i.e. not enough servers)

But I'd like to think that rack attack shouldn't be the point of failure for scalability, considering if this was a DOS attack it would have been successful. I'm trying to determine if adding a more aggressive redis timeout would have helped. But if the redis calls did timeout, I think it would have still failed without a change similar to the one suggested here?

Nowaker · 2021-06-30T20:47:15Z

But if the redis calls did timeout, I think it would have still failed without a change similar to the one suggested here?

That is correct as the timeout would end up as an exception, and these abort the execution.

By the way, I didn't really consider timeouts in my fix... If it timeouts after 30 or 60 seconds, the user would have already abandoned the application. It's effectively still a dead website, even with my fix.

I wonder if it's possible to configure the Redis client with a very short timeout, e.g. 1 second, so that a timeout exception is raised very quickly, and we move to the next middleware ASAP. If that's possible, it would be great.

jeremyhaile · 2021-06-30T22:43:10Z

I wonder if it's possible to configure the Redis client with a very short timeout, e.g. 1 second, so that a timeout exception is raised very quickly, and we move to the next middleware ASAP.

Yes, we were thinking about the same thing. By default rack-attack uses Rails.cache, but perhaps you could configure it with your own redis cache that uses a driver configured with a shorter timeout than what your other redis clients use.

johnnyshields · 2022-03-18T04:38:34Z

Amen to this. We were bit by it this morning.

I propose there should be two modes in response to Redis (i.e. cache) unavailability:

Hard fail (current behavior)
Disable Rack-Attack for the request, so system continues to work.

I would propose that #2 be the default behavior, because I suspect the vast majority of users would have little use for DoS mitigation if the entire system goes down (i.e. Redis DoSes your system for you!) This will of course require a relatively low timeout setting--something like 50~100ms Redis connection timeout would be recommended.

I will look at making a PR for this. Part of the problem is that the Rails cache layer tends to suppress errors and return nil... will see what can be done here.

johnnyshields · 2022-03-18T11:49:58Z

PR raised here: #577

grzuy added area: middleware type: feature request labels Jan 24, 2021

johnnyshields linked a pull request Mar 18, 2022 that will close this issue

Add fault-tolerance for cache system errors #577

Open

santib mentioned this issue Oct 7, 2024

Cache Store Down Policy #669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't fail when Redis is misbehaving, e.g. OOM command not allowed when used memory > 'maxmemory' #511

Don't fail when Redis is misbehaving, e.g. OOM command not allowed when used memory > 'maxmemory' #511

Nowaker commented Dec 20, 2020 •

edited

Loading

Nowaker commented Dec 20, 2020 •

edited

Loading

jeremyhaile commented Jun 30, 2021

Nowaker commented Jun 30, 2021

jeremyhaile commented Jun 30, 2021

johnnyshields commented Mar 18, 2022 •

edited

Loading

johnnyshields commented Mar 18, 2022

Don't fail when Redis is misbehaving, e.g. OOM command not allowed when used memory > 'maxmemory' #511

Don't fail when Redis is misbehaving, e.g. OOM command not allowed when used memory > 'maxmemory' #511

Comments

Nowaker commented Dec 20, 2020 • edited Loading

Nowaker commented Dec 20, 2020 • edited Loading

jeremyhaile commented Jun 30, 2021

Nowaker commented Jun 30, 2021

jeremyhaile commented Jun 30, 2021

johnnyshields commented Mar 18, 2022 • edited Loading

johnnyshields commented Mar 18, 2022

Nowaker commented Dec 20, 2020 •

edited

Loading

Nowaker commented Dec 20, 2020 •

edited

Loading

johnnyshields commented Mar 18, 2022 •

edited

Loading