Skip to content

Track additional statistics#6390

Draft
malsadev wants to merge 17 commits into
LemmyNet:mainfrom
malsadev:additional-statistics-scheduled-task
Draft

Track additional statistics#6390
malsadev wants to merge 17 commits into
LemmyNet:mainfrom
malsadev:additional-statistics-scheduled-task

Conversation

@malsadev
Copy link
Copy Markdown
Contributor

@malsadev malsadev commented Mar 8, 2026

Issue: #6288

Summary of changes:

In local_site table:

  1. Renamed columns posts, users, (and others) to local_posts, local_users, etc.
  2. Added columns total_posts, total_users, etc
  3. Added language_usage_percent jsonb column with a non null constraint and default value {}

There were tests that were exercising some trigger logic in the db so I fixed the db triggers/functions here.

  1. Added update_stats method which runs some basics queries and updates local_site table and added it to the daily scheduled tasks
  2. Added process_language_breakdown (runs as part of update_stats) which calculates the percentage breakdown of posts per language tag and updates the newly added language_usage_percent field in local_site table
  3. Added test_update_stats and test_process_language_breakdown tests
  4. Added --features full flag to ./scripts/dump_schema.rs

Outstanding items:

  • Comment percentage calculation (is this needed?)
  • Ban rate tracking
  • Accepted/failed signups tracking
  • Are extra filters needed for the counts? For example, should banned/deleted users be included?
  • Manual testing
  • pipeline fail

Comment thread migrations/2026-03-08-173221-0000_additional-statistics/up.sql Outdated
Comment thread migrations/2026-03-08-173221-0000_additional-statistics/up.sql Outdated
ADD COLUMN total_communities integer;

ALTER TABLE local_site
ADD COLUMN user_retention_percent integer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the percentages/rates should probably be floats.

Comment on lines +20 to +21
ALTER TABLE local_site
ADD COLUMN local_post_english_percent integer;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful this metric would be. I spose it could be useful for servers who are multi-lingual, and wanting to try to get rid of english usage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it shouldnt be limited to a single language, but have usage percentage for every language. Maybe with a new column language.usage_percentage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added language_usage_percent jsonb column. It will contain something like:

{
"en" : "30.10",
"und": "10.00",
"de": "59.90"
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of slamming json into sql, we really should try to avoid that. Either add a new table, or better yet just remove this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best put it in the existing language table.

malsadev and others added 10 commits March 10, 2026 15:32
- Rename `users/posts/comments/communities` to `local_users/local_posts/local_comments/local_communities` to distinguish from the new total_* columns
- Add `NOT NULL DEFAULT 0` to all new statistics columns (linked_instances, total_*, rates) instead of nullable
- Update triggers.sql, nodeinfo.rs, convert.rs, and tests to use new column names
@malsadev malsadev changed the title Additional statistics scheduled task Track additional statistics Mar 12, 2026
@Nutomic
Copy link
Copy Markdown
Member

Nutomic commented May 11, 2026

Clippy is failing, see here or run ./scripts/lint.sh

ADD COLUMN linked_instances integer NOT NULL DEFAULT 0;

ALTER TABLE local_site
ADD COLUMN total_posts integer NOT NULL DEFAULT 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you can combine most or all of these into a single statement, but doesnt make much difference.

ALTER TABLE local_site
    ADD COLUMN linked_instances integer NOT NULL DEFAULT 0,
    ADD COLUMN total_posts integer NOT NULL DEFAULT 0;

Comment on lines +29 to +30
ALTER TABLE local_site
ADD COLUMN language_usage_percent jsonb NOT NULL DEFAULT '{}'::jsonb;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better do

Suggested change
ALTER TABLE local_site
ADD COLUMN language_usage_percent jsonb NOT NULL DEFAULT '{}'::jsonb;
ALTER TABLE language
ADD COLUMN usage_percent float default 0;

ADD COLUMN accepted_signups_rate integer NOT NULL DEFAULT 0;

ALTER TABLE local_site
ADD COLUMN failed_signups_rate integer NOT NULL DEFAULT 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rates should be float. Also they are not currently written by the scheduled task.

update(local_site::table)
.set(local_site::total_posts.eq(total_post_count))
.execute(conn)
.await?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only make a single update query using LocalSiteUpdateForm. Or you can update multiple columns with a tuple like this:

  update(local_site::table)
    .set((
      local_site::total_posts.eq(total_post_count),
      local_site::total_comments.eq(total_comment_count),
      ...
    ))   
   .execute(conn)
   .await?;

.select(count_star())
.first::<i64>(conn)
.await
.map(i32::try_from)??;
Copy link
Copy Markdown
Member

@Nutomic Nutomic May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to update_local_user_count this should filter out deleted and banned users. Same for communities, posts and comments. There deleted and removed should be filtered out.

post_count.lang_code,
Value::Number(
serde_json::Number::from_f64(
(post_count.post_count as f64 * 10000.0 / f64::from(local_post_count)).round() / 100.0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing with all the conversions, should be simpler when switching to language.usage_percent as mentioned below. And why multiply with 10000 then divide by 100?

update(local_site::table)
.set(local_site::language_usage_percent.eq(Value::Object(post_counts)))
.execute(conn)
.await?;
Copy link
Copy Markdown
Member

@Nutomic Nutomic May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also doesnt need a separate update query. Instead return the calculated usage percentage, and write it in a single query in update_stats

.select(count_star())
.first::<i64>(conn)
.await
.map(i32::try_from)??;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a method FederatedInstanceView::count, then you can reuse FederatedInstanceView::joins instead of writing the same thing again.

@malsadev
Copy link
Copy Markdown
Contributor Author

@Nutomic I appreciate the new wave of feedback. I'm cleaning up the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants