Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data quality checks need to be performed before data anlysis in order to eliminate bad data or
rather ignore bad data while performing the analysis.
The following SQL was used to find if any user_id violated that relationship and belonged to
more than one team_id.
--- this returned the user_id=456468590 which was associated with two different teams and
had to be ignored
Below mentioned SQL was used to find if any duplicate data existed.
Problem statement mentions that it contains data belonging to a particular day. Below SQL will
return rows which are not from that particular day.
Alert_type and event columns are supposed to have only specified values and other values may
be treated as bad data. Below SQL may return if we found any such data. This returned an
empty set for our table.
- The best performing alert_type is sidebar_alert since that is the alert that has been
used the most number of times to send alerts.
App which has sent more alerts is the best performing and which has sent the least number of
alerts is the worst performing one.
Best performing app is the app with app_id=15 and worst performing app is the app with
app_id=38
BEST APP:
WORST APP:
Number of teams which clicked an alert of that type which was their first alert in the day:
Below SQL gives us the total number of teams that have clicked an alert_type which has been
their first alert_type on that day.
+-----------------+---------------+
| Number_of_teams | alert_type |
+-----------------+---------------+
| 270 | banner_alert |
| 188 | push_alert |
| 558 | sidebar_alert |
+-----------------+---------------+
Note: Best performing alert_type, best performing app and worst performing app are runaway
winners and will not be affected by bad data.