Today, I’ll be talking about an interesting combination. Running Kafka consumer as a scheduled Celery task. Go on reading, if you find this interesting 😉
First of all, there are multiple ways to get a Kafka consumer running. Some of them are:
Having a separate microservice – This is a good approach as it allows independent scaling of the consumers. However there is an overhead of managing, monitoring this microservice, ensuring connectivity is fine etc. Still ideal for large, complex setup where we want to have dedicated consumers capable of scaling in/out on demand.
Running as a daemon task/background task – The consumer can as well run in background or as a daemon process. This will be a forever running process as consumers are meant to be. However troubleshooting/monitoring daemon processes can often be cumbersome.
We did not go with the above approaches, as ours is a light weight application and we did not want a forever running process running along with our application for the consumer. Comes in Celery 🙂
Before I dive into the implementation specific details, below is the tech stack we use:
Django (django-2.1.7)
Celery (celery-4.3.0)
Kafka (kafka-python-2.0.2)
Python (python-3.2)
Celery is a distributed task scheduling module which can be easily integrated with applications (in our case, python celery) and provide task scheduling options in a simple yet reliable manner. We decided to use Celery to schedule consumer polls at regular intervals instead of running it forever, however things like graceful message consumption, offset management had to be tuned to make this work. Let’s discuss how we went about this.
We use kafka-python module in our Django app to implement the consumer. Although kafka-python does not support transactional producer/consumer at the moment, it does not cause any issues per se. Some caveats being, there might be a lag in message consumption as the control batch messages (commit/abort markers) will likely not get consumed.
Below are the steps we followed:
Define a Celery task to poll the consumer every 5 mins.
Consumer has a poll timeout of 3 mins (meaning it will stop polling and end task if no messages are available for consumption within this duration).
The Celery task is force killed once the consumer times out, to prevent the task from running infinitely.
A new consumer will run every 5 minutes (in a new Celery task) and even if an old one is running, it will not impact as the consumers will be part of same consumer group.
The consumer has auto commit enabled which will commit the offset for consumed messages every second.
In-case the consumer goes down in between/before/after processing a message, and commit is not done, it will be re-tried once a new consumer is up. Duplicate messages are handled with some explicit checks as well (based on our use-case)
In-case the consumer goes down in between/before/after processing a message and commit is done, it will not be re-tried.
The auto_offset_reset flag is set to latest, meaning in-case the offset is corrupted (due to Kafka crash etc), messages will not be re consumed from beginning, preventing duplicate message consumption.
Let’s have a look at the consumer implementation. I’ll not go over the Celery task setup part as that is readily available in the Celerygetting-started guide.
import os, signal, time
from kafka import KafkaConsumer
def kafka_consumer():
try:
# Signal handler to kill Celery task
def handler(signum, frame):
print("Hard exiting celery task as consumer poller has timed out...")
sys.exit(0)
# Initialize Kafka consumer
consumer = KafkaConsumer('topic_name',bootstrap_servers=['kafka_broker:port'],
api_version=(0, 10),
# SSL settings required incase your broker has mTLS configured, else skip
security_protocol='SSL',
ssl_check_hostname=True,
ssl_cafile='ca.pem',
ssl_certfile='cert.pem',
ssl_keyfile='key.pem',
consumer_timeout_ms=180000,
group_id='consumer_group_name',
auto_offset_reset='latest', # Guarantees no re-consumption of messages in-case of Kafka crash
enable_auto_commit=True,
auto_commit_interval_ms=1000)
print("Polling for topic: {topic}".format(topic='topic_name'))
for msg in consumer:
print("Message Key is {k} and Value is {v}".format(k=msg.key,v=msg.value))
# TODO: Process Message
# TODO: Handle duplicate consumption explicitly if required
# Allow time for auto-commit before closing consumer thread
time.sleep(2)
except Exception as e:
print('Exception {err} while consuming message(s) from Kafka: '.format(err=str(e)))
finally:
consumer.close
# Kill celery task
signal.signal(signal.SIGALRM, handler)
signal.alarm(1)
Some points to note in the implementation above:
The Celery task is only killed once the consumer times out and has no new messages to consume. This ensures consumer is not killed in between while consuming/processing messages.
A delay of 2 secs is introduced with auto-commit duration set to 1 sec. This ensures processed message is always committed.
Note the explicit handling of duplicate message processing is important as in-case of consumer/Kafka crash, messages may be re-consumed, in-case the commit has not been done.
The above setup is live for our application, deployed in K8s and works exactly the way we want!
Response time is of utmost importance for any application. It not only makes the application more responsive but also enhances the user experience. In one of my previous post we utilized memcache to cache view response and improve the response time (performance) of the site by almost 10X. This was basically at server-side. However we can further improve this by making some smart client side (browser) caching. Django’sConditional View Processing is very apt for this scenario.
Django http decorator (django.views.decorators.http) functions provide an easy way to set cache for conditional requests. Etags or content based caching can be very useful for cases where setting time based caching can be a challenge. When etag is set, an If-None-Match request header is sent with the etag value of the last requested version of the resource for all subsequent requests. If the current version has the same etag value, indicating its value is the same as the browser’s cached copy, then an HTTP status of 304 is returned and content is served from the browser cache, boosting the response time as content does not need to be fetched from server/server cache.
The advantage with etag is that, caching can be done based on the response content. Although this involves in generating the etag based on the content which can be a hash or any other string identifier. Web servers (like nginx) can also do this nowadays, however we will see in this post how to set this from application end in Django.
The @etag decorator really makes things simple in Django. You can read more about this decorator in the Django official doc, but here I’ll focus on the implementation.
In my scenario, I am caching the etag as well so that the same etag can be used and also it provides me a manual way to clear the cache and reset the etag (django-clearcache) if required without making any code change/server restarts.
The first thing is to write the get_etag function which returns the etag.
# Get etag (for client side caching)
def get_etag(request, **kwargs):
etag_key = request.path.split('/')[2]
return cache.get(etag_key, None)
The snippet above returns the etag from cache. The cache key is based on the request path, so that the etag decorator can use the same get_etag function for multiple view functions. Once this is done we just need to specify the decorator in our view function.
It’s as simple as shown above. Now the etag is set along with the view’s response cache as below:
# Handle caching
def cache_response(view, url, cache_name, etag_cache_name):
# Get response from cache
response = cache.get(cache_name)
# Invoke POST call to get data from DRF if cache is not set
if not response:
status, response = send_api_request(url, view, None, None)
if status != 200:
raise Exception('Error fetching data from API: ' + response.content)
else:
# Set cache
cache.set(cache_name, response, None)
cache.set(etag_cache_name, str(datetime.datetime.now()), None)
return response
This would set the response as well as the etag in cache. Here am using the datetime stamp as the etag value.
Now, for etag cache invalidation, it should be the same process as followed for the response cache invalidation, i.e. whenever the model is changed the cache should be invalidated. It is not time/user based. Thus once the cache is set, all users would benefit from it and would get invalidated only when the model has been changed. Cache invalidation is a very crucial factor and is totally based on your application/requirements and how static/dynamic your content is. Be sure to spend some time understanding which approach would best fit before finalizing the design.
In my case, the post save signal of the model is where I put the invalidation logic, which would basically be triggered whenever the model is changed.
# Signal to handle cache invalidation@receiver(post_create_historical_record)
def invalidate_cache(sender, **kwargs):
model_name = kwargs.get('instance').__class__.__name__
if model_name == 'Server' and cache.has_key('server_cache') and cache.has_key('server_data'):
cache.delete('server_cache')
cache.delete('server_data')
if model_name == 'Uri' and cache.has_key('uri_cache') and cache.has_key('uri_data'):
cache.delete('uri_cache')
cache.delete('uri_data')
I used the django_simple_history post save signal as I have history enabled for multiple models. Thus instead of using different signals for different models, the django_simple_history’s post save signal can be used.
Note – Incase you have deployed your application in Kubernetes and use Nginx Ingress Controller, ensure to set gzip to off, else Nginx will discard etag from response header. This can be done by the below annotation in your Ingress manifest:
This post is kind of continuation of my last post on Inventory app with DRF in Django. As an enhancement, an approval workflow to moderate any updation/deletion/addition of a new model entry was to be put in place. Now, I did look up for ready made solutions and could find a few open source ones – Django Moderation was one of them. However, it had its limitation and was not compatible with the latest Django3.
So, I decided to come-up with a custom solution utilizing the DRF framework as well. This post basically outlines the approach I took, the challenges I faced and how they were resolved.
To start with, below is the flow diagram of the approval workflow that has been implemented:
The approach followed to implement the above:
Create an abstract model which needs moderation
Create 2 models inheriting the abstract model – One acts as the primary model, and the other an approval model, where changes can be moderated and then saved to the primary model based on actions taken by moderator
Instead of using Django admin site for data entry, create a form (template) to Add/Update data – Used DRF Forms for this as all permission and field serialization is handled by DRF – so effective DRY principal!
Create a separate template for the Approval model, from where entries can be moderated.
Whenever an entry is added/updated, it is first saved in the approval model. On successful moderation, the entry is saved in the primary model. Sounds simple right? 😉 Well, let’s find that out.
Now, a lot of templating work can be prevented in-case you go with the out of the box Django admin site, however I would suggest to spend some time to have templates designed for these actions which would give a more professional look to the site and also allow granular customizations. (especially in the UI)
Let’s have a look at the model:
# Abstract Model
class AbstractServer(models.Model):
component = models.ForeignKey(Component, blank=False, null=False, on_delete=models.CASCADE)
name = models.CharField(max_length=50, unique=True, blank=False, null=False)
ip = models.GenericIPAddressField(unique=True, blank=False, null=False)
dc = models.ForeignKey(Dc, blank=False, null=False, on_delete=models.CASCADE)
environment = models.ForeignKey(Environment, blank=False, null=False, on_delete=models.CASCADE)
type = models.ForeignKey(Type, blank=True, null=True, on_delete=models.CASCADE)
state = models.CharField(
max_length=10,
choices=STATES,
blank=False
)
tags = models.ManyToManyField(Tag, blank=True)
group = models.ForeignKey(Group, blank=False, null=False, on_delete=models.CASCADE)
description = models.TextField(blank=True, null=True)
requestor = models.ForeignKey(User, blank=False, null=False, default=1, on_delete=models.CASCADE)
class Meta:
abstract = True
# Primary Model (inherited from abstract model)
class Server(AbstractServer):
changed_by = models.ForeignKey(User, blank=False, null=False, on_delete=models.CASCADE, related_name='server_changed_by')
history = HistoricalRecords()
@property
def _history_user(self):
return self.changed_by
@_history_user.setter
def _history_user(self, value):
self.changed_by = value
def __str__(self):
return self.name
# Approval Model (inherited from abstract model)
class ServerApproval(AbstractServer):
action = models.CharField(
max_length=10,
choices=APPROVAL_ACTIONS,
blank=False
)
approver = models.ForeignKey(User, blank=False, null=False, on_delete=models.CASCADE, related_name='server_approver')
status = models.CharField(
max_length=10,
choices=APPROVAL_STATES,
blank=False
)
datetime = models.DateTimeField(blank=False)
comments = models.TextField(blank=True, null=True)
def __str__(self):
return self.name
Note the ‘abstract=True‘ flag in the abstract model’s meta class. This is to ensure that the abstract model is not created in DB and only the inherited models are. Perfect use-case for our requirement. You can read more about model inheritance from the Django official documentation. There are some fields specific to the primary model as it has django-simple-histiory enabled, while the approval model has fields such as action, status, approver very specific to the model to capture moderation details as the intention is to save these information as well.
Now that our model is ready, let’s see how we can use DRF to provide a form to the user to add/update changes. And these changes should be read from the primary model however saved in the approval model first. Now the same thing can be done using Django formset totally leaving out DRF. However as we already have DRF configured and things as such as permissions, field serialization already done, I found using DRF’s TemplateHTMLRenderer a much elegant solution.
Once DRF is setup, meaning you have necessary serializers, url routers in place, we would need an additional approval model view to render the form and also save the details in the approval model.
# Server edit form backend
class ServerEditForm(APIView):
permission_classes = [IsAuthenticatedOrStaffUser,]
renderer_classes = [TemplateHTMLRenderer]
template_name = 'server_edit_form.html'
style = {'vertical_style': {'template_pack': 'rest_framework/vertical'},
'horizontal_style': {'template_pack': 'rest_framework/horizontal'}}
def get(self, request, **kwargs):
param = kwargs.get('param')
# Pass users
users = User.objects.exclude(id=request.user.id)
# Render add server form
if param == 'add':
serializer = ServerApprovalWriteSerializer()
return Response({'serializer': serializer, 'action': 'add', 'users': users, 'style': self.style})
# Render update server form
elif 'update' in param:
server = get_object_or_404(Server, pk=kwargs.get('param').replace('update',''))
serializer = ServerWriteSerializer(server)
return Response({'serializer': serializer, 'server': server, 'users': users, 'style': self.style})
def post(self, request, **kwargs):
serializer = None
param = kwargs.get('param')
# Update status and requestor fields
request.data._mutable = True
request.data['status'] = 'Pending'
request.data['approver'] = request.POST.get('approver')
request.data['requestor'] = request.user.pk
request.data['datetime'] = datetime.datetime.now()
# Submit add server request
if param == 'add':
request.data['action'] = 'Add'
name = request.data['name']
ip = request.data['ip']
# Validate if server exists in Server model
if Server.objects.filter(name=name).exists() or Server.objects.filter(ip=ip).exists():
return JsonResponse({'response': 'ERROR adding server Server/IP already exists!'})
# Validate if server exists in ServerApproval model
if ServerApproval.objects.filter(name=name).exists() or ServerApproval.objects.filter(ip=ip).exists():
server = ServerApproval.objects.get(Q(name=name) | Q(ip=ip))
if (server.status == 'Pending' or server.status == 'On Hold'):
return JsonResponse({'response': 'ERROR adding server Existing request found for server({0})/IP({1})! You may raise a new request once the existing request is Approved or Rejected'.format(name, ip)})
else:
# Updating server approval entry
serializer = ServerApprovalWriteSerializer(server, data=request.data)
else:
# Add new entry
serializer = ServerApprovalWriteSerializer(context={'request': request}, data=request.data)
# Submit update server request
else:
try:
request.data['action'] = request.POST.get('action').capitalize()
request.data['comments'] = request.POST.get('comments')
existing_server = get_object_or_404(Server, pk=kwargs.get('param'))
server = ServerApproval.objects.get(name=existing_server.name)
serializer = ServerApprovalWriteSerializer(server, data=request.data)
except ServerApproval.DoesNotExist:
# Server does not exists in approval queue! Create new entry in approval queue
serializer = ServerApprovalWriteSerializer(context={'request': request}, data=request.data)
# Handle serializer error
if not serializer.is_valid():
return JsonResponse({'response': 'ERROR submitting server details {0}'.format(str(serializer.errors))})
# Save changes
serializer.save()
# Redirect to response page
return JsonResponse({'response': 'Change successfully submitted to approval queue!'})
Now, the approval model APIView class as shown above, might look a little complicated however, much of it is because of the extra fields and requirement specific customizations I had. In simpler terms. what it does is, override the GET and POST methods of the APIView. The GET method returns the form fields which is rendered in template and shown to user to request for entry addition/updates. If you notice the serializers used in GET, for additions it uses ServerApproval serializer and for updations it uses Server serializer. This is done to get primary model fields for updations and approval model fields for new entries. For POST which actually performs the save(), only ServerApproval serializer is used, meaning changes will be read from primary model but will be saved into the new approval model only, which is what we want. Apart from this, there are some checks/validations to handle edge cases. (for e.g. User tries to update entry already requested and so on). There are some fields which are set explicitly, like requestor/datetime/status which basically is not rendered in the form as we do-not want the user to specify these.
Now, lets have a look at the form template and url patterns.
I have removed css/js from the template to reduce the LOC, however you can see how the DRF form is rendered. Some fields for which I want custom data (e.g. User fields with only current user), I do-not use the form-renderer, however simple HTML inputs. If you go back to the APIView class, you will find, data for these custom fields are fed separately to the serializer. You can find more information about DRF form rendering in the DRF official documentation. One thing I would like to highlight is, to customize the UI (e.g. add validations/style etc) a lot of element overrides needs to be done as the out of the box styling for DRF forms is pretty basic.
So majority of the work is completed and the only thing pending is handling the approval model template/actions. I additionally enhanced the approval model template to display a comparison table for update requests which show original data vs changes done for the entry. Below in the views.py you may refer the function to achieve this.
Basically, the approval model entry id is returned from the approval template and its corresponding primary model data is matched and returned back to the template to show the comparison.
For the final save, a simple model save action is triggered when the approval status is updated from the UI template.
All invocations is done using AJAX and the response is returned as in the functions above to display to the user.
When a model approval change is saved with status approved, a post-save signal is used to save the entry to the primary model as below:
# Signal to handle server approval
@receiver(post_save, sender=ServerApproval)
def create_update_approved_server(sender, instance, **kwargs):
# Add server entry on approval
if instance.status == 'Approved' and instance.action == 'Add':
server_obj = Server.objects.create(name = instance.name,
component = instance.component,
ip = instance.ip,
dc = instance.dc,
environment = instance.environment,
type = instance.type,
state = instance.state,
group = instance.group,
description = instance.description,
requestor = instance.requestor,
changed_by = instance.approver
)
server_obj.tags.add(*instance.tags.all())
# Update server entry on approval
if instance.status == 'Approved' and instance.action == 'Update':
server_obj, created = Server.objects.update_or_create(name = instance.name,
defaults = {'component': instance.component,
'ip': instance.ip,
'dc': instance.dc,
'environment': instance.environment,
'type': instance.type,
'state': instance.state,
'group': instance.group,
'description': instance.description,
'requestor': instance.requestor,
'changed_by': instance.approver
}
)
server_obj.tags.add(*instance.tags.all())
# Delete server entry on approval
if instance.status == 'Approved' and instance.action == 'Delete':
server_obj = Server.objects.get(name = instance.name)
server_obj.delete()
As I had a many-to-many field, had to add those after the model was saved using the .add () method as shown above. The signal ensures to add an entry to the primary table if not present (for add requests) and update an existing one if already present (update requests).
I am using datatables with jquery for the template. Snippet to display the approval form (in jquery dialog)
I hope you have got a basic understanding of how we can achieve moderation in Django models. The solution works seamlessly and also gives us control to customize stuff based on requirements.
I think thats a pretty long post. Hope you don’t get bored reading it 😉
Of late, I was creating a dashboard in Django for infra inventory tracking and was looking how best to do it. Basically, the requirements were:
Tabular view of servers and related details
Audit and History views for any change done
RESTful services to expose API’s to get/post/patch/delete data
Fast performance with minimal lag
Scalable/Configurable
I have worked on such requirements before, however the API part was something new. Enter the Django Rest Framework (DRF). Am listing down some of the challenges I faced while developing this.
The base was pretty simple, and it was a not-so-complex model based view that I wrote. Basically, a server model with server host, ip and other details. It had some foreign key’s like DC, Environment etc.
Now, with DRF, serializing the fields the way I want was the first hurdle. I wanted different serializers for read and write modes, where read required no authentication, better display of API data (especially the foreign keys) and eager loading enabled. Ended up with a mixin to achieve this. Note the SELECTFIELDS for 1-1 mapped fields and PREFETCH_FIELDS for 1-Many in the eager loading mixin. This was done to get around the infamous n+1 queries issue.
#serializer.py
# Eager load mxin
class EagerLoadingMixin:
@classmethod
def eager_loading(cls, queryset):
if hasattr(cls, "_SELECT_FIELDS"):
queryset = queryset.select_related(*cls._SELECT_FIELDS)
if hasattr(cls, "_PREFETCH_FIELDS"):
queryset = queryset.prefetch_related(*cls._PREFETCH_FIELDS)
return queryset
class ServerReadSerializer(serializers.ModelSerializer, EagerLoadingMixin):
component = ComponentSerializer(read_only=True)
dc = DcSerializer(read_only=True)
environment = EnvironmentSerializer(read_only=True)
type = TypeSerializer(read_only=True)
tags = TagSerializer(read_only=True, many=True)
group = GroupSerializer(read_only=True)
changed_by = UserSerializer(read_only=True)
_SELECT_FIELDS = ['component','dc', 'environment', 'type', 'group', 'changed_by']
_PREFETCH_FIELDS = ['tags',]
class Meta:
model = Server
fields = '__all__'
class ServerWriteSerializer(serializers.ModelSerializer):
class Meta:
model = Server
fields = '__all__'
read_only_fields = ('changed_by',)
def create(self, validated_data):
user = self.context['request'].user
validated_data['changed_by'] = user
return super().create(validated_data)
Also note, the ‘changed_by‘ readonly field, which was added to audit changes done by user, using the simple-history module. Will come to that in a bit, however the readonly setting for this field was important as we do-not want users to select this field while making a post request to the API. Also see the create method override, which is done to set the changed_by field to the current logged in user.
For permissions, had a custom permission class and the noteworthy django-filter which I find so much better than the default search field provided by DRF. Check the views.py file below:
#views.py
# Serializer mixins
class ReadWriteSerializerMixin(object):
read_serializer_class = None
write_serializer_class = None
def get_serializer_class(self):
if self.action in ["create", "update", "partial_update", "destroy"]:
return self.get_write_serializer_class()
return self.get_read_serializer_class()
def get_read_serializer_class(self):
assert self.read_serializer_class is not None, (
"'%s' should either include a `read_serializer_class` attribute,"
"or override the `get_read_serializer_class()` method."
% self.__class__.__name__
)
return self.read_serializer_class
def get_write_serializer_class(self):
assert self.write_serializer_class is not None, (
"'%s' should either include a `write_serializer_class` attribute,"
"or override the `get_write_serializer_class()` method."
% self.__class__.__name__
)
return self.write_serializer_class
# Custom permission
class IsAuthenticatedOrSuperUser(BasePermission):
def has_permission(self, request, view):
return bool(
request.method in ('GET', 'HEAD', 'OPTIONS') or
request.user and
request.user.is_authenticated and
request.user.is_staff and
request.user.is_superuser
)
# Viewsets
class ServerViewSet(ReadWriteSerializerMixin, viewsets.ModelViewSet):
queryset = Server.objects.all()
queryset = ServerReadSerializer.eager_loading(queryset) # Eager loading to improve performance
read_serializer_class = ServerReadSerializer
write_serializer_class = ServerWriteSerializer
permission_classes = [IsAuthenticatedOrSuperUser,]
filter_backends = (filters.DjangoFilterBackend, )
filterset_fields = {'component__name': ['exact', 'iexact'],
'name': ['exact', 'iexact'],
'ip': ['exact', 'iregex'],
'dc__name': ['exact', 'iexact', 'iregex'],
'environment__name': ['exact', 'iexact'],
'type__name': ['exact', 'iexact'],
'state': ['exact', 'iexact'],
'tags__name': ['exact', 'iexact', 'icontains', 'iregex'],
'group__name': ['exact', 'iexact'],
'description': ['exact', 'icontains', 'iregex']
}
This worked perfect, and with the eager loading setup, could see some performance improvement as well. But it was not enough.
Enter memcached. There can be a debate between memcached vs redis. However I just wanted a simple caching backend which just does caching. Redis is so much more to be frank and thus I decided to carry on with memcached. Did not go with the default memcached middleware which caches basically all sites. Had a custom caching configured with a preset key, so that the invalidation would be easier. Ignore, the ‘url filtering‘ logic below as that’s something my requirement specific, but otherwise the caching is quite straight forward.
# Handle caching
def cache_response(view, url, cache_name):
# Do-not serve from cache if filter is passed
if '&' in url:
status, response = send_api_request(url, view, None, None)
if status != 200:
raise Exception('Error fetching data from API: ' + response.content)
else:
# Get response from cache
response = cache.get(cache_name)
# Invoke POST call to get data from DRF if cache is not set
if not response:
status, response = send_api_request(url, view, None, None)
if status != 200:
raise Exception('Error fetching data from API: ' + response.content)
else:
# Set cache
cache.set(cache_name, response, None)
return response
For invalidation, utilized the simple-history post-save signal, which I also used, to add audit entries whenever a model was updated.
@receiver(post_create_historical_record)
def post_create_record_callback(sender, **kwargs):
# Define audit action
# Get model class name
model_name = kwargs.get('instance').__class__.__name__
action = 'Unknown'
history = kwargs.get('history_instance')
history_type = history.history_type
if history_type == '+':
action = 'Added'
elif history_type == '-':
action = 'Deleted'
elif history_type == '~':
action = 'Updated'
# Add audit entry
audit = Audit(name = action,
logs = '{0} {1}'.format(model_name, history.name),
datetime = datetime.datetime.now(),
user = kwargs.get('history_user'))
audit.save()
# Clear cache
if cache.has_key('audit_cache'):
cache.delete('audit_cache')
if model_name == 'Server' and cache.has_key('server_cache'):
cache.delete('server_cache')
One thing to highlight here is the ‘.__class__.__name__’ function. This function returns the class name of an object, thus allowing to get the model name modified and in turn used this information to invalidate the appropriate cache key. Would also like to point, that there is no direct way to clear a prefix cache which I tried before this approach. With this approach I have more control on what to cache and when to invalidate selective cache keys. This signal is very specific to django-simple-history module, so google up the same if some of the kwargs options are not clear. Practically the cache set had a None timeout, meaning it would never get invalidated on its own and would only get invalidated with any change with the model which is exactly what I wanted. This led to an almost 10X performance improvement. From ~ 20 seconds the response time came down to ~ 2 seconds.
While integrating simple-history found this very useful to have the history enabled in admin site with my custom model fields listed.
The save_model override is again for the changed_by field. The field is in the exclude list as well as we don’t want to show it to anyone even in admin site.
The model reference:
#models.py
class Server(models.Model):
component = models.ForeignKey(Component, blank=False, null=False, on_delete=models.CASCADE)
name = models.CharField(max_length=50, unique=True, blank=False, null=False)
ip = models.GenericIPAddressField(unique=True, blank=False, null=False)
dc = models.ForeignKey(Dc, blank=False, null=False, on_delete=models.CASCADE)
environment = models.ForeignKey(Environment, blank=False, null=False, on_delete=models.CASCADE)
type = models.ForeignKey(Type, blank=True, null=True, on_delete=models.CASCADE)
state = models.CharField(
max_length=10,
choices=STATES,
blank=False
)
tags = models.ManyToManyField(Tag, blank=True, related_name='tags')
group = models.ForeignKey(Group, blank=False, null=False, on_delete=models.CASCADE)
description = models.TextField(blank=True, null=True)
changed_by = models.ForeignKey(User, blank=False, null=False, on_delete=models.CASCADE)
history = HistoricalRecords()
@property
def _history_user(self):
return self.changed_by
@_history_user.setter
def _history_user(self, value):
self.changed_by = value
def __str__(self):
return self.name
You can see the simple-history settings configured and changed_by field introduced.
Hope that clears thing up. Let’s get to the template now. I used datatables for the presentation layer, and a djangorestframework-datatables module to fetch the data from the API itself which my app would be exposing, rather than querying the models again. While this may look like a round about way of achieving things, but would definitely remove a lot of lines of code and also enable filtering capabilities which django-filter provides for DRF. Finally this would also kind of be a unit testing for the API’s we write. The approach is definitely arguable, but it worked for me and worked really good.
So just had to use the DRFAPIRequestFactory to send a GET request to my API with an added ‘?format=datatables‘ param and wohoo, the response was well-formatted and ready for datatables to consume.
The request function using APIRequestFactory is as below:
However, one challenge I faced was to construct the URL. I did not want to use the nasty getpath/host commands and hardcode the context. This is what I found after much detailed lookup of the DRF documentation:
The router configurations for ViewSet is well documented in DRF site, so not going into that detail. But the above worked perfect. Note the ‘-list ‘ with the basename. This is something that DRF sets. You can refer the DRF docs to know more.
So we are pretty much close to the end. Some more enhancements I did was on the UI side (using JQuery/Colvis/Buttons/Menu.js/Intro.js) just to make the UI more appealing.
On the memcached management front, installed 2 modules, django-memcache-status, django-clearcache, which gave me the options to see the memcached stats and also clear the cache from Admin site.
That’s pretty much about it. The end result is something that I am really liking ! 😉
Problem : I want to authenticate users form a third-party auth module (can be rest based), without writing custom backends or installing any module (like social-auth).
First of all we need to understand how our third party auth module works or at-least what response it gives/posts on successful login.
Say, to login a user we redirect our user to a third-party auth url which in turn posts some parameters to our django app (can be encrypted user attributes for eg.) on successful login.
This can be easily achieved in Django just by writing some logic in views.py instead of writing a whole new auth-backend to handle it.
We will need the below to make this work:
Third party auth url where we need to redirect our users for authentication. (Can be any SSO/MFA auth url which your org might be providing)
The exact post parameters the third party module returns on successful login.
Note – The third party auth module should trigger a post call to our django app once authentication is successful along with necessary parameters. This is something that needs to be configured at the auth module end.
Below are the code snippets:
views.py
from django.contrib.auth.import login
from django.contrib.auth.models import User
from django.shortcuts import resolve_url, render
from django.views.decorators.csrf import csrf_import
from django.views.decorators.cache import never_cache
from django.http.response import HttpResponseRedirect
@csrf_exempt
@never_cache
defcustom_login(request, next_page = None):
return_values = request.POST.get('ReturnValues')
next_page = request.GET.get('next_page')
if return_values isnotNone:
# Decrypt and parse response and add to dictparameters = decrypt_duo_response(settings.DUO_SECRET_KEY, return_values)
# 'user_attributes' dict will have all user parameters in key-value pair
user_attributes = {}
for param in parameters:
item = param.split('=')
user_attributes[item[0]] = item[1]
# Get user_id and email_id from duo response
user_id = user_attributes['userid']
email_id = user_attributes['mail']
if user_id isnotNone:
try:
user = User.objects.get(username=user_id)
except:
# Create user if not present
user, created = User.objects.update_or_create(username=user_id, password='', email=email_id)
# Login user (No authentication required as it is handled by third-party auth module)
login(request, user)
# Redirect to previous page post login if next_page is defined
if next_page isnotNone:
next_page = resolve_url(next_page)
return HttpResponseRedirect(next_page)
return render(request, 'index.html')
Assumptions:
The third-party auth module does a post call to our django app on successful user authentication with ReturnValues as a parameter which contains user attributes (userid, mail) in encrypted format.
decrypt_duo_response is a function to decrypt the ReturnValues and get the user attributes
urls.py
from . import views
from django.contrib.auth.views import logout
from django.conf.urls import url
urlpatterns = [
url(r'^$', views.custom_login, name='custom_login'),
url(r'^logout/$', logout, name='auth_logout'),
]
index.html
{%if user.is_authenticated %}
Welcome, <strong>{%firstof user.get_full_name user %}</strong><br/><ahref="{%url 'auth_logout' %}?next={{request.get_full_path}}">Logout</a>{%else%}<ahref="<your third party auth url here>?next_page={{request.get_full_path}}">Login</a>{%endif%}
settings.py
AUTHENTICATION_BACKENDS = (
'django.contrib.auth.backends.ModelBackend',
)
MIDDLEWARE = [
# 'debug_toolbar.middleware.DebugToolbarMiddleware', 'django.middleware.cache.UpdateCacheMiddleware','django.middleware.common.CommonMiddleware','django.contrib.sessions.middleware.SessionMiddleware','django.middleware.csrf.CsrfViewMiddleware','django.contrib.auth.middleware.AuthenticationMiddleware','django.contrib.messages.middleware.MessageMiddleware',# Uncomment the next line for simple clickjacking protection:
# 'django.middleware.clickjacking.XFrameOptionsMiddleware',
'django.middleware.cache.FetchFromCacheMiddleware',
]