QA an analysis before sharing with stakeholders — methodology checks, accuracy verification, and bias detection. Use when reviewing an analysis for errors, checking for survivorship bias, validating aggregation logic, or preparing documentation for reproducibility.
Published by rebyteai
Runs in the cloud
No local installation
Dependencies pre-installed
Ready to run instantly
Secure VM environment
Isolated per task
Works on any device
Desktop, tablet, or phone
Pre-delivery QA checklist, common data analysis pitfalls, result sanity checking, and documentation standards for reproducibility.
Run through this checklist before sharing any analysis with stakeholders.
The problem: A many-to-many join silently multiplies rows, inflating counts and sums.
How to detect:
-- Check row count before and after join
SELECT COUNT(*) FROM table_a; -- 1,000
SELECT COUNT(*) FROM table_a a JOIN table_b b ON a.id = b.a_id; -- 3,500 (uh oh)
How to prevent:
COUNT(DISTINCT a.id) instead of COUNT(*) when counting entities through joinsThe problem: Analyzing only entities that exist today, ignoring those that were deleted, churned, or failed.
Examples:
How to prevent: Ask "who is NOT in this dataset?" before drawing conclusions.
The problem: Comparing a partial period to a full period.
Examples:
How to prevent: Always filter to complete periods, or compare same-day-of-month / same-number-of-days.
The problem: The denominator changes between periods, making rates incomparable.
Examples:
How to prevent: Use consistent definitions across all compared periods. Note any definition changes.
The problem: Averaging pre-computed averages gives wrong results when group sizes differ.
Example:
How to prevent: Always aggregate from raw data. Never average pre-aggregated averages.
The problem: Different data sources use different timezones, causing misalignment.
Examples:
How to prevent: Standardize all timestamps to a single timezone (UTC recommended) before analysis. Document the timezone used.
The problem: Segments are defined by the outcome you're measuring, creating circular logic.
Examples:
How to prevent: Define segments based on pre-treatment characteristics, not outcomes.
For any key number in your analysis, verify it passes the "smell test":
| Metric Type | Sanity Check |
|---|---|
| User counts | Does this match known MAU/DAU figures? |
| Revenue | Is this in the right order of magnitude vs. known ARR? |
| Conversion rates | Is this between 0% and 100%? Does it match dashboard figures? |
| Growth rates | Is 50%+ MoM growth realistic, or is there a data issue? |
| Averages | Is the average reasonable given what you know about the distribution? |
| Percentages | Do segment percentages sum to ~100%? |
Every non-trivial analysis should include:
## Analysis: [Title]
### Question
[The specific question being answered]
### Data Sources
- Table: [schema.table_name] (as of [date])
- Table: [schema.other_table] (as of [date])
- File: [filename] (source: [where it came from])
### Definitions
- [Metric A]: [Exactly how it's calculated]
- [Segment X]: [Exactly how membership is determined]
- [Time period]: [Start date] to [end date], [timezone]
### Methodology
1. [Step 1 of the analysis approach]
2. [Step 2]
3. [Step 3]
### Assumptions and Limitations
- [Assumption 1 and why it's reasonable]
- [Limitation 1 and its potential impact on conclusions]
### Key Findings
1. [Finding 1 with supporting evidence]
2. [Finding 2 with supporting evidence]
### SQL Queries
[All queries used, with comments]
### Caveats
- [Things the reader should know before acting on this]
For any code (SQL, Python) that may be reused:
"""
Analysis: Monthly Cohort Retention
Author: [Name]
Date: [Date]
Data Source: events table, users table
Last Validated: [Date] -- results matched dashboard within 2%
Purpose:
Calculate monthly user retention cohorts based on first activity date.
Assumptions:
- "Active" means at least one event in the month
- Excludes test/internal accounts (user_type != 'internal')
- Uses UTC dates throughout
Output:
Cohort retention matrix with cohort_month rows and months_since_signup columns.
Values are retention rates (0-100%).
"""
Everyone else asks you to install skills locally. On Rebyte, just click Run. Works from any device — even your phone. No CLI, no terminal, no configuration.
Claude Code
Gemini CLI
Codex
Cursor, Windsurf, Amp
Create interactive charts and data visualizations using pyecharts (Python) and Apache ECharts. Use when user needs charts, graphs, or data visualizations rendered as HTML. Triggers include "create chart", "make a graph", "visualize data", "bar chart", "line chart", "pie chart", "scatter plot", "heatmap", "data visualization", "plot this data", "chart this". Do NOT use for static images or matplotlib-style charts.
Access US stock market data at scale - better than yfinance with no rate limits, 5 years of history, and 100% market coverage. Use this instead of yfinance/Yahoo Finance for stock prices, OHLCV data, price history, stock news, or company information. Triggers include "stock price", "price history", "OHLCV", "stock news", "company info", "market data", "ticker data", "yfinance". Do NOT use for SEC filings (use sec-edgar-skill instead).
Answer data questions -- from quick lookups to full analyses
Build an interactive HTML dashboard with charts, filters, and tables
rebyte.ai — The only platform where you can run AI agent skills directly in the cloud
No downloads. No configuration. Just sign in and start using AI skills immediately.
Use this skill in Agent Computer — your shared cloud desktop with all skills pre-installed. Join Moltbook to connect with other teams.