Django Queryset - Method annotate, aggregate, alias 살펴보기(1)

Django에서 지원하는 다양한 QuerySet Method를 살펴보고자 한다.

1. annotate()

1.1 개요

Queryset의 각 개체에 제공된 query 표현식 목록을 annotate로 표시한다. 표현식은 단순한 값, 모델의 필드(또는 관련 모델)에 대한 참조 또는 집계 표현식(Avg, Sum, Count) 일 수 있다.

annotate()에 대한 각 인수는 반환되는 QuerySet의 각 개체(중요하다. aggregate와의 차이점)에 추가된다. 즉 keyword 인수로 지정하면 그 key를 가지고 value를 가져올 수 있다.
예를 들어

>>> from django.db.models import Count
>>> q = Blog.objects.annotate(Count('entry'))
# The name of the first blog
>>> q[0].name
'Blogasaurus'
# The number of entries on the first blog
>>> q[0].entry__count
42

이런식으로 annotate를 사용하여 entry의 개체의 수를 구할 수 있다. 만약 lookup fields 명이 마음에 들지 않는다면, keyword 인수로 이름을 재정의 할 수 있다.

>>> q = Blog.objects.annotate(number_of_entries=Count('entry'))
# The number of entries on the first blog, using the name provided
>>> q[0].number_of_entries
42

1.2 filter와 exclude를 혼합하여 사용

또한 filter()와 exclude() 등과 혼합하여 쓸 수 있다.

>>> from django.db.models import Avg, Count
>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors'))

이 경우엔 Django로 시작되는 책으로 filter가 적용되고 각 개체마다 authors의 수가 들어간다.

또한 만약 두 개의 개별 필터가 있는 분리된 annotate가 필요하다면 집계와 함께 필터 인수를 사용할 수 있다. (여기서 사용한 Q에 대해서는 다음에 알아보겠다.)

>>> highly_rated = Count('book', filter=Q(book__rating__gte=7))
>>> Author.objects.annotate(num_books=Count('book'), highly_rated_books=highly_rated)

annotate()와 filter()는 순서에 따라 다른 결과가 나올 수 있다. annotate가 요청되는 지점까지 쿼리 상태에 대해 annotate가 계산된다. 이는 filter()와 annotate()가 교환 연산이 아니라는 것을 의미한다.
아래와 같은 조건에서 filter()와 annotate()를 연산한 결과를 보자.

Publisher A has two books with ratings 4 and 5.
Publisher B has two books with ratings 1 and 4.
Publisher C has one book with rating.

>>> a, b = Publisher.objects.annotate(num_books=Count('book', distinct=True)).filter(book__rating__gt=3.0)
>>> a, a.num_books
(<Publisher: A>, 2)
>>> b, b.num_books
(<Publisher: B>, 2)

>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book'))
>>> a, a.num_books
(<Publisher: A>, 2)
>>> b, b.num_books
(<Publisher: B>, 1)

두 결과 모두 출판사 C는 제외된다.

첫 번째 쿼리에서 annotate()는 filter() 앞에 있으므로 필터는 annotate에 영향을 미치지 않는다(filter로 rate 3점 초과를 query 했지만 annotate 결괏값은 여전히 2이다). distinct=True를 통해 쿼리 버그를 방지한 모습이다.

두 번째 쿼리에서 먼저 filter를 통해 rate가 3점 초과인 출판사를 거르고, 그 후 annotate() 연산을 하였다.

또다른 집계 함수인 Avg를 통해 살펴보자.

>>> a, b = Publisher.objects.annotate(avg_rating=Avg('book__rating')).filter(book__rating__gt=3.0)
>>> a, a.avg_rating
(<Publisher: A>, 4.5)  # (5+4)/2
>>> b, b.avg_rating
(<Publisher: B>, 2.5)  # (1+4)/2

>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(avg_rating=Avg('book__rating'))
>>> a, a.avg_rating
(<Publisher: A>, 4.5)  # (5+4)/2
>>> b, b.avg_rating
(<Publisher: B>, 4.0)  # 4/1 (book with rating 1 excluded)

정리하자면 ORM이 복잡한 쿼리를 SQL로 변환하는 방법을 직관적으로 이해하기 어렵기 때문에 의심스러운 경우 str(queryset)으로 SQL을 검사한다. 그리고 많은 테스트를 작성해야 한다.

1.3 order_by

order_by는 annotate에서 선언한 별칭을 참조할 수 있다. 예를 들어, 책에 기여한 저자 수에 따라 정렬하려면 다음과 같은 쿼리를 사용하면 된다.

>>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors')

1.4 values()

그룹핑을 위한 방법을 소개한다. values()에 지정된 필드의 조합에 따라 그룹화합니다. 그런 다음 그룹에 대한 annotate가 제공된다. annotate는 그룹의 모든 구성원에 대해 계산된다.

예를 들어, 각 저자가 쓴 책의 평균 등급을 찾는 query를 생각해보자.

>>> Author.objects.annotate(average_rating=Avg('book__rating'))

이렇게 하면 DB의 각 저자에 대한 결과가 하나씩 반환되고 평균 도서 등급이 처리 된다. 그러나 values()를 사용하면 결과가 달라진다.

>>> Author.objects.values('name').annotate(average_rating=Avg('book__rating'))

이 예제에서는 작성자가 이름별로 그룹화되므로 각 고유한 작성자 이름에 대해서만 annotate가 달린 결과가 표시된다. 즉, 이름이 같은 두 명의 저자가 있는 경우 결과가 단일 결과로 병합된다. 평균은 두 저자가 작성한 책의 평균으로 계산된다.

filter와 마찬가지로 values()의 순서도 annotate와 중요하다.

위의 예제를 순서를 달리 해서 보자.

>>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name', 'average_rating')

이 경우 values에는 annotate에서 선언한 키워드가 명시적으로 표현되어 있다. 또한 values()에서 그룹화가 이뤄지지 않은 결과가 출력된다.

1.1.5 Aggregating annotations

annotate 결과에 대한 집계를 생성할 수 도 있다. 예를 들어 책당 평균 저자 수를 계산하려면 먼저 저자 수로 annotate를 실행하고 이를 참조하여 해당 저자 수를 집계하자.

>>> from django.db.models import Avg, Count
>>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors'))
{'num_authors__avg': 1.66}

다음에는 aggregate()를 공부해보자.

참고

1. https://docs.djangoproject.com/en/4.1/ref/models/querysets/#annotate

Django

The web framework for perfectionists with deadlines.

docs.djangoproject.com

2. https://docs.djangoproject.com/en/4.1/topics/db/aggregation/