logo
Tags down

shadow

Replace dozens of joins with something more efficient to count individually grouped columns


By : AMV
Date : September 17 2020, 12:00 AM
hope this fix your issue There is a table (in PostgreSQL) with multiple columns (from n1 to n10) each of which contains individual digits in every row (in the example below the digits are 1, 2 and 3 for simplicity purposes). There are thousands of rows in the table. Excerpt of the data in the table: , You can use filtered aggregation and a single join using an array:
code :
select t.number, 
       count(*) filter (where tt.n1 = t.number) as n1_count,
       count(*) filter (where tt.n2 = t.number) as n2_count,
       count(*) filter (where tt.n3 = t.number) as n3_count,
       count(*) filter (where tt.n4 = t.number) as n4_count
from the_table tt
  join (values (1),(2),(3) ) as t(number) on t.number = any(array[tt.n1,tt.n2,tt.n3,tt.n4])
group by t.number;  


Share : facebook icon twitter icon

joins on integer columns more efficient than joins on varchar in Postgresql?


By : user3146105
Date : March 29 2020, 07:55 AM
around this issue I'm not familiar with Postgresql, but I would expect this to be true on any database for the simple reason that comparison of integers is far more efficient than comparison of strings.
To do a join the database needs to search the index on the key field. Searching an integer index has to be quicker than searching a string index. Not only is there less data involved, the comparison can be performed lightning quick in a single CPU operation rather than some probably complicated string comparison that makes use of case sensitivity and localisation logic.

What is an efficient way to get a count of objects grouped by a field in Django?


By : user2237491
Date : March 29 2020, 07:55 AM
This might help you This can be accomplished by using the objects.values together with annotate. here is a sample model + test.
first in your models.py
code :
class Foo(models.Model):
    action_type = models.CharField(max_length=50)
from django.test import TestCase
from django.db.models import Count
from foo.models import Foo

class MyTestCase(TestCase):
    def test_group_query(self):
        options = ('created', 'deleted', 'updated')
        for i in range(32):
            Foo.objects.create(
                action_type=options[i%3]
                )
        results = Foo.objects.values('action_type').annotate(Count('action_type'))
        print results
{'action_type__count': 11, 'action_type': u'created'}, 
{'action_type__count': 11, 'action_type': u'deleted'}, 
{'action_type__count': 10, 'action_type': u'updated'}

efficient count distinct across columns of DataFrame, grouped by rows


By : dayle gallagher
Date : March 29 2020, 07:55 AM
With these it helps My understanding is that nunique is optimized for large series. Here, you have only 3 days. Comparing each column against the others seems to be faster:
code :
testDf = genSampleData(100000,3, drinkIndex)
days = testDf.columns[1:]

%timeit testDf.iloc[:, 1:].stack().groupby(level=0).nunique()
10 loops, best of 3: 46.8 ms per loop

%timeit pd.melt(testDf, id_vars ='custId').groupby('custId').value.nunique()
10 loops, best of 3: 47.6 ms per loop

%%timeit
testDf['nunique'] = 1
for col1, col2 in zip(days, days[1:]):
    testDf['nunique'] += ~((testDf[[col2]].values == testDf.ix[:, 'day1':col1].values)).any(axis=1)
100 loops, best of 3: 3.83 ms per loop
10 columns: 143ms, 161ms, 30.9ms
50 columns: 749ms, 968ms, 635ms
100 columns: 1.52s, 2.11s, 2.33s

MYSQL: two joins on same table become double grouped count. How to fix it?


By : Catherine Li
Date : March 29 2020, 07:55 AM
wish helps you Having this simplified structure: , Distinct Count may solve your problem
code :
SELECT clients.*, salesagents.name, COUNT(DISTINCT v1.id) as visits_number, COUNT( DISTINCT v2.id) as visits_number_last_month
FROM `clients`
LEFT JOIN `salesagents` ON `clients`.`salesagents_id`=`salesagents`.`id`
LEFT JOIN `visits` as `v1` ON `clients`.`id` = `v1`.`clients_id`
LEFT JOIN `visits` as `v2` ON `clients`.`id` = `v2`.`clients_id` AND `v2`.`date` > FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_SUB(NOW(), INTERVAL 1 MONTH)))
GROUP BY `clients`.`id`

Query that involves dozens of joins to the same table not returning all results


By : user3216373
Date : March 29 2020, 07:55 AM
it helps some times While @Larnu is correct that you need to normalize your data, in the meantime you can find the data problem by replacing your JOINs with LEFT JOIN. It will show which column(s) have a NULL record, which is causing your data to be "missing".
It's possible that this is "expected" results in that, if you have an odd number of teams, it's possible that some brackets might be skipped.
Related Posts Related Posts :
  • How to retrieve JSON jsonb value with postgresql-simple?
  • postgres: Index on a timestamp field
  • Unable to connect to server ( postgres ) - GCP - Kubernetes
  • update count from a table in another
  • Strange Exception Handling in Postgres
  • Clojure: creating a date-time object for the JDBC
  • PostgreSQL data warehouse: create separate databases or different tables within the same database?
  • Tryiing to force Liquibase to use Postgres mode for H2 in-memory DB
  • pgAdmin4: remote access to PostgreSQL server
  • PostgreSQL privilege grant not visible
  • postgres which data type to use for storing no of days (may include half days)
  • pg_dump and restore - pg_restore hangs on Windows
  • How to drop constraint on postgers?
  • SQL database design for tags. How to handle missing relation between two tables?
  • How to store a large array in Postgres?
  • Is there any reason to include a `tsvector` column in a postgres table rather than in the index?
  • Postgresql Foreign data wrapper error Non-superuser cannot connect if the server does not request a password
  • QGis How do i get the area of a triangle of 3 points in postgis
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk