Reference Check AB Test Analysis

Author

Megan Neisler, Staff Data Scientist, Wikimedia Foundation

Published

May 3, 2024

Introduction

The Wikimedia Foundation’s Editing team is working on a set of improvements for the visual editor to help new volunteers understand and follow some of the policies necessary to make constructive changes to Wikipedia projects.

This work is guided by the Wikimedia Foundation Annual Plan, specifically by Wiki Experiences 1.2: Complete improvements to four workflows that improve the experience of editors with extended rights (admins, patrollers, functionaries, and moderators of all kinds); extend their creativity; impact at least four different wikis, and meet KRs for each improvement set collaboratively with the volunteers.

The first version of Edit Check (Reference Check) invites users who have added more than 50 new characters to an article namespace to include a reference to the edit they’re making if they have not already done so themselves at the time they indicate their intent to save.

You can find more details on the heuristic to trigger this check on this ticket and in the default config values. Additionally, the kinds of edits Reference Check thinks warrant a reference can be found by filtering Recent changes using the newly introduced editcheck-references tag.

You can find more information about features of this tool and project updates on the project page.

Methodology

The team ran an AB test from 18 February 2024 through 4 April 2024 to determine the impact of this first iteration of edit check (Task). Specifically, we want to know if the reference check feature improved the quality of new content edits newcomers and Junior Contributors make in the main namespace.

During this experiment, 50% of users were randomly assigned to the test group and were shown the reference check notice if the edit met the specified requirements during their edit, and 50% were randomly assigned to the control group and provided the default editing experience (no reference check shown when requirements were met).

The test included all mobile web and desktop contributors (both registered and unregistered) to the 11 participating wikis (see full list of participating Wikipedias on the this task description).

Figure 1: Diagram of the reference check AB test bucketing

As shown in Figure 1, not all edits bucketed in the AB test experiment met the requirements for being shown the reference check. Reference check was only shown if the contributor met the specified requirements at the time they indicated their intent to save by clicking the pre-publish button.

Reference check was shown to 8,255 of published new content edits by newcomers, junior contributors, and unregistered users in the test group as indicated by the editcheck-reference-activated tag). 34% of published new content edits shown reference check added a new reference as indicated by the editcheck-newreference tag. There were 9,078 published new content edits in the control group identified as eligible to be shown reference check (as indicated by the editcheck-references tag).

We defined two key performance indicators to test the hypothesis that the quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will include a reference or an explicit acknowledgement as to why these edits lack references: (1) proportion of new content edits that include a new reference and (2) proportion of new content edits that reverted within 48 hours of being published. These were evaluated as well as identified curiosities and guardrails, which are detailed in the task and Edit Check project page.

Data Reviewed

Data on the each user’s editing workflow was collected in EditAttemptStep and VisualEditorFeatureUse. This data was also supplemented with published edit and revision tag data contained in mediawiki_history and change_tag table. As part of the Edit Check project, the Editing team introduced several different edit tags, which were used in the analysis to evaluate the types of new content edits published.

Data analysis was limited to contributors with 100 or fewer edits or unregistered users as they are the user groups identified as eligible for reference check based on the default config settings.

In the analysis, we compared the following AB test populations:

  • Control and Test: All new content edits by contributors with under 100 edits or unregistered users bucketed into one of the AB experiment groups (Test or Control).

  • Control (Eligible but not shown reference check) and Test (Shown reference check): All new content edits by contributors with under 100 edits or unregistered users that met the requirements (as documented by Edit Check default config settings) to be shown reference check at the time they indicated intent to save. In the test group, these users would be shown reference check and in the control group they would not.

For each metric, I reviewed the following dimensions: overall, by platform (desktop or mobile web), by user status (registered or unregistered), by user experience level (newcomer or junior contributor), and by partner wiki. I also reviewed data collected just for contributors from Sub-Saharan Africa as that was identified as our target audience.

Summary of results

Impacts

  • New content edits with a reference: Users are 2.2 times more likely to publish a new content edit that includes a reference and is constructive (not reverted within 48 hours) when reference check is shown to eligible edits.
    • Increases were across all all reviewed user types, wikis, and platforms.
    • The highest observed increase was on mobile. On mobile, new content edits by contributors are 4.2 times more likely to include a reference and not be reverted when reference check is shown to eligible edits.
  • New content revert rate: New content edit revert rate decreased by 8.6% if reference check was available.
    • While some nonconstructive new content edits with a reference were introduced by this feature (5 percentage point (pp) increase), there was a higher proportion of constructive new content edits with a reference added (23.4 pp increase).
    • New content edits by editors from Sub-Saharan Africa are 53 percent less likely to be reverted when reference check is shown to eligible edits.
  • Constructive Retention Rate: Contributors that are shown reference check and successfully save an edit are 16 percent more likely to return to make an unreverted edit in their second month. This increase was primarily observed for desktop edits. There was a non-statistically significant difference observed on mobile.

Guardrails

  • Edit Completion Rate: We did not observed any drastic decreases in edit completion rate from intent to save (where reference check is shown) to save success overall or by wiki. Overall, there was a 10% decrease in edit completion rate for edits where reference check was shown compared to the control group.
    • There was a higher observed decrease in edit completion rate on mobile compared to desktop. On mobile, edit completion rate decreased by -24.3% (-13.5pp) while on desktop it decreased by only -3.1% (-2.3pp).
  • Block Rate: There were decreases or no changes in the rate of users blocked after being shown reference check compared to the control group.
  • False Negative Rate: There was a low false negative rate. Only 1.8% of all published new content edits in the test group did not include a new a reference and were not shown edit check.
  • False Positive Rate: 6.6% of contributors dismissed adding a citation because they indicated the new content being added does not need a reference. This was the least selected decline option overall. Only 1.7% of editors from Sub-Saharan Africa selected this option.

KPI 1: Proportion of new content edits that include a reference

Hypothesis: Reference check will increase the likelihood that newcomers and Junior Contributors editing from within Sub-Saharan Africa will accompany the new content they are adding with a reference.

Methodology For the calculation of this metric, we reviewed the proportion of all published new content edits (identified by the tag editcheck-newcontent) by users with 100 or fewer edits or unregistered users that included a new reference (editcheck-newreference) and were not reverted within 48 hours.

Some clarifications about how these tags are applied:

  • The editcheck-newcontent tag is applied to all VisualEditor edits that add new content, where “new content” is defined by the conditions that were defined in T324730 and are now codified in editcheck/modules/init.js.

  • The editcheck-new reference tag is applied to all VisualEditor edits in the main namespace that involve an edit where people add a net new reference.

Show the code
published_edits_reference <-
  read.csv(
    file = 'Queries/data/published_edits_reverts_updated.csv',
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

Data Cleaning

Show the code
# define set of all eligible edits to review (eligible in control and activated in test)
published_edits_reference <- published_edits_reference %>%
    mutate(is_test_eligible  = ifelse((experiment_group == '2024-02-editcheck-reference-test' &       is_edit_check_activated == 1) |(experiment_group == '2024-02-editcheck-reference-control'     & is_edit_check_eligible == 1) , 'eligible', 'not eligible'),
      is_test_eligible = 
      factor(
      is_test_eligible,
      levels = c("eligible",  "not eligible" )
  )) 

# rename platform column to state mobile over phone as it's more clear
published_edits_reference <- published_edits_reference %>%
   mutate(platform = factor(platform,
                            levels = c('desktop', 'phone'),
                            labels = c("Desktop", "Mobile")))
## create new column to track only activated edits within test group. Just one event in control group misapproapiral labeled.
published_edits_reference <- published_edits_reference %>%
  mutate(is_edit_check_activated  = ifelse((experiment_group == '2024-02-editcheck-reference-test' & is_edit_check_activated == 1),
  'reference check shown', 'no reference check'),
           is_edit_check_activated = factor(is_edit_check_activated,
         levels = c('reference check shown', 'no reference check')
         ))

published_edits_reference$date <- as.Date(published_edits_reference$date, format = "%Y-%m-%d")

#clarfiy wiki names
published_edits_reference <- published_edits_reference %>%
  mutate(
    wiki = case_when(
      #clarfiy participating project names
      wiki == 'arwiki' ~ "Arabic Wikipedia", 
      wiki == 'afwiki' ~ "Afrikaans Wikipedia", 
      wiki == 'eswiki' ~ "Spanish Wikipedia",  
      wiki == 'frwiki' ~ "French Wikipedia", 
      wiki == 'itwiki' ~ "Italian Wikipedia", 
      wiki == 'jawiki' ~ "Japanese Wikipedia",
      wiki == 'ptwiki' ~ "Portuguese Wikipedia",
      wiki == 'swwiki' ~ "Swahili Wikipedia", 
      wiki == 'yowiki' ~ "Yoruba Wikipedia", 
      wiki == 'viwiki' ~ "Vietnamese Wikipedia",
      wiki == 'zhwiki' ~ "Chinese Wikipedia", 
    )
  ) 


  

# Set experience level group and factor levels
published_edits_reference <- published_edits_reference %>%
  mutate(
    experience_level_group = case_when(
     experience_level == 0 ~ 'Newcomer',
     experience_level > 0 & experience_level <= 100 ~ "Junior Contributor",
     experience_level >  100 ~ "Non-Junior Contributor"   
    ),
    experience_level_group = factor(experience_level_group,
         levels = c("Newcomer", "Non-Junior Contributor", "Junior Contributor")
   ))      

published_edits_reference$is_from_ssa <-
  factor(
    published_edits_reference$is_from_ssa,
    levels = c( "not_sub_saharan_africa","sub_saharan_africa"),
    labels = c( "Not from Sub-Saharan Africa", "Sub-Saharan Africa")
  )

published_edits_reference$includes_new_reference <-
  factor(
    published_edits_reference$includes_new_reference,
    levels = c(0,1),
    labels = c( "No new reference included", "New reference included")
  )

published_edits_reference$experiment_group <-
  factor(
    published_edits_reference$experiment_group,
    levels = c( "2024-02-editcheck-reference-control","2024-02-editcheck-reference-test"),
    labels = c( "Control", "Test")
  )

## create new data set limited to conditions of new content and add column to show various published new content splits for visualization purposes

published_edits_reference_new_content <- published_edits_reference %>%
   filter(is_new_content == 1,
          experience_level <= 100)  %>%
  mutate(
    new_content_edit_type = case_when(
    experiment_group == "Test" & includes_new_reference == "No new reference included"  ~ "Test (No reference included)", 
    experiment_group == "Test" & is_edit_check_activated == "reference check shown" & includes_new_reference == "New reference included"  ~ "Test (reference check shown and ref included)", 
    experiment_group == "Test" & is_edit_check_activated == "no reference check" & includes_new_reference == "New reference included"  ~ "Test (No reference check shown and ref included)",     
    experiment_group == "Control" & is_edit_check_eligible == 1 & includes_new_reference == "No new reference included"  ~ "Control (Eligible but no ref included)", 
    experiment_group == "Control" & is_edit_check_eligible == 0 & includes_new_reference == "No new reference included"  ~ "Control (Not eligible but no ref included)", 
    experiment_group == "Control"  & includes_new_reference == "New reference included"  ~ "Control (Reference included)",  
    TRUE ~ as.character("NA")))

Overall

Show the code
published_edits_reference_overall <- published_edits_reference_new_content %>%
    group_by(experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0])) %>%  #look at unreverted references
  mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
published_edits_reference_overall_table <- published_edits_reference_overall %>%
  gt()  %>%
  tab_header(
    title = "Constructive new content edits that include a reference and are not reverted"
  )  %>%
  cols_label(
    experiment_group = "Experiment Group",
    n_content_edits = "Number of new content edits",
    n_content_edits_wref = "Number of new content edits with new reference",
    prop_edits = "Proportion of new content edits with a new reference"
  ) %>%
  tab_footnote(
    footnote = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours",
    locations = cells_column_labels(
      columns = "n_content_edits"
    ))
  

published_edits_reference_overall_table
Constructive new content edits that include a reference and are not reverted
Experiment Group Number of new content edits1 Number of new content edits with new reference Proportion of new content edits with a new reference
Control 10875 2061 19%
Test 10072 4275 42.4%

1 Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours

Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reference_overall %>%
    ggplot(aes(x= experiment_group, y = n_content_edits_wref/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    scale_y_continuous(labels = scales::percent) +
      geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = "Constructive new content edits that include a new reference",
           caption = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Edit Check Activated")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=20),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 2: New content edits by newcomers and junior contributors that include a reference increased by 2.2x if reference check was shown to eligible edits. This excludes edits that were reverted within 48 hours.

Users are 2.2 time more likely to publish a new content edit that includes a reference and is constructive (not reverted within 48 hours) when reference check is shown to eligible edits. This includes all published edits identified as new content (editcheck-newcontent) by unregistered users or users with fewer than 100 cumulative edits.

19% of new content edits in the control group were published with a new reference which is aligns with the baseline rate established T332848.

However, not all users in the test group met the requirements for being shown reference check. Some users included a reference prior to attempting to save their new content edit and would not have been shown the reference check prompt.

Proportion of overall new content edits by eligibility for reference check

We further reviewed the proportion of new content edits in both the test and control group that met the requirements for being shown reference check at the time the user indicated intent to save. For these eiligible edits, users in the test group would be shown reference check and users in the control group would not.

Note: This includes all new content edits (both reverted and unreverted).

Show the code
# sort by new content edit type
published_edits_newcontent_type <- published_edits_reference_new_content %>%
    group_by(experiment_group, new_content_edit_type) %>%
    summarise(n_content_edits = n_distinct(editing_session)) %>%  
    group_by(experiment_group)  %>%
mutate(prop_edits = n_content_edits/sum(n_content_edits))
Show the code
published_edits_ref_waffle_test <-  published_edits_newcontent_type  %>%
    filter(experiment_group == 'Test')  %>% #limit to tests
    mutate(new_content_edit_type = factor(
    new_content_edit_type,
    levels = c( "Test (No reference check shown and ref included)", "Test (reference check shown and ref included)","Test (No reference included)"),
    labels = c("Reference Included; No Reference Check", "Reference Included; Reference Check Shown", "No Reference Included" # set factor levels
  ))) %>%
    count(new_content_edit_type, wt = n_content_edits) %>%
    ggplot(aes(fill = new_content_edit_type, values = n)) +
  expand_limits(x=c(0,0), y=c(0,0)) +
  coord_equal() +
  labs(fill = NULL, colour = NULL) +
  theme_enhance_waffle() 
Show the code
published_edits_ref_waffle_test +
   labs (title = "New content edits in test group \n shown reference check" )  +
   scale_fill_manual(name = "Reference check shown",
                     values= c("steelblue2", "steelblue2", "#999999"))  +
   geom_waffle(
    aes(colour = new_content_edit_type),
    n_rows = 10, size = 2, flip = TRUE,
       make_proportional = TRUE)  +
  scale_colour_manual(
          name = "Reference check shown",
        values = c( "white", "#E69F00", "white")) +
  labs(color  = "Guide name", fill = "Guide name") +
      theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=14),
        legend.position= "right",
        axis.line = element_line(colour = "black"))

Figure 3:In the test group, the reference check was shown and a new reference was included at 33% of all published new content edits. This represents 65% of all unreverted new content edits with a reference.
New content edits in the test group by if reference check shown
New content edit type Number of new content edits1 Proportion of new content edits
Test (No reference check shown and ref included) 1636 16.2%
Test (No reference included) 5168 51.3%
Test (reference check shown and ref included) 3269 32.5%

1 Includes all published new content edits by unregistered users or users with 100 edits or fewer

In the test group, the reference check was shown and a new reference was included at 33% of all published new content edits. This represents 65% of all unreverted new content edits with a reference.

16.2% of users in the test group added a new reference prior to indicating their intent to save and were not eligible to be shown the reference check. This is comparable to the proportion of users in the control group that included a reference and were not identified as eligible for reference check.

Show the code
published_edits_ref_waffle_control <- published_edits_newcontent_type  %>%
  filter(experiment_group == 'Control')  %>%
     mutate(new_content_edit_type = factor(
    new_content_edit_type,
    levels = c("Control (Reference included)", "Control (Eligible but no ref included)", "Control (Not eligible but no ref included)"),
    labels = c("Reference included", "Eligible; No reference included", "Not eligible; No reference included"))) %>% # set factor levels
  count(new_content_edit_type, wt = n_content_edits) %>%
  ggplot(aes(fill = new_content_edit_type, values = n)) +
  expand_limits(x=c(0,0), y=c(0,0)) +
  coord_equal() +
  labs(fill = NULL, colour = NULL) +
  theme_enhance_waffle()
Show the code
published_edits_ref_waffle_control  +
   labs (title = "New content edits in control group \n eligible for reference check ")  +
   scale_fill_manual(values= c("#999999", "#E69F00", "black"), name = "Reference check eligible")  +
   geom_waffle(
    n_rows = 10, size = 2, colour = "white", flip = TRUE,
         make_proportional = TRUE
  )  + theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=14),
        legend.position= "right",
        axis.line = element_line(colour = "black")) 

Figure 4: 78% of the new content edits in the control group did not include a reference and were tagged as being eligible (shown in orange) and would have been shown reference check if available
New content edits in the control group by if eligible for reference check
New content edit type Number of new content edits1 Proportion of new content edits
Control (Eligible but no ref included) 8454 77.7%
Control (Not eligible but no ref included) 143 1.3%
Control (Reference included) 2278 20.9%

1 Includes all published new content edits by unregistered users or users with 100 edits or fewer

78% of the new content edits in the control group did not include a reference and were tagged as being eligible (shown in orange in Figure 4). Reference check would have been shown to all these edits if available, resulting in a portion of them (35% ) to add a reference based on test group results.

Note: There is also a small proportion of new content edits (1.3%) in the control group that did not include a new reference and were not identified as eligible as represented by the black square as shown in Figure 4. See more details on this false negative rate in Guardrail #3 section of this report.

Unconstructive new content edits

While KPI 1 is focused on constructive edits, we also wanted to review the impact of reference check on the proportion of new content edits that include a reference and are unconstructive (as indicated by being reverted within 48 hours).

Show the code
published_edits_reference_overall_reverted <- published_edits_reference_new_content %>%
    group_by(experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 1])) %>%  #look at unreverted references
  mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
published_edits_reference_overall_reverted_table <- published_edits_reference_overall_reverted %>%
  gt()  %>%
  tab_header(
    title = "New content edits that include a reference and are reverted within 48 hours"
  )  %>%
  cols_label(
    experiment_group = "Experiment Group",
    n_content_edits = "Number of new content edits",
    n_content_edits_wref = "Number of new content edits with new reference that are reverted",
    prop_edits = "Proportion of new content edits with a new reference that are reverted"
  ) %>%
  tab_footnote(
    footnote = "Includes all new content edits by unregistered users or users with 100 or fewer edits reverted within 48 hours",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  ) 

published_edits_reference_overall_reverted_table
New content edits that include a reference and are reverted within 48 hours
Experiment Group Number of new content edits1 Number of new content edits with new reference that are reverted Proportion of new content edits with a new reference that are reverted
Control 10875 217 2%
Test 10072 629 6.2%

1 Includes all new content edits by unregistered users or users with 100 or fewer edits reverted within 48 hours

There was a 5 percentage point increase in the proportion of new content edits published with a reference and reverted when reference check was shown (0.6% -> 6%). Reference check caused a significant increase in the number of new content edits with a reference and this included some low quality edits.

However, the 5 percentage point increase is signficantly smaller than the 23.4 percentage point increase in unreverted new content edits with a reference. As a result, we still observed an overall decrease in revert rate (as detailed in KPI 2: Revert Rate).

By Platform

Show the code
published_edits_reference_byplatform <- published_edits_reference_new_content %>%
    group_by(platform, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
    mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- ggplot(data = published_edits_reference_byplatform, aes(x = experiment_group, y = n_content_edits_wref)) +
    geom_bar(aes(fill = experiment_group), position = 'stack', stat = 'identity') +
   geom_text(data = published_edits_reference_byplatform , aes(label = paste(prop_edits)), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~platform) +
    labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = "Constructive new content edits that include a new reference \n by platform" ,
          caption = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_y_continuous(labels = function(x) paste0(round(x/sum(x) * 100, 1), "%")) +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Reference check eligible")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
p

Figure 5: Increases observed on both desktop and mobile. On mobile, users are 4.2 times more likely to include a reference with their new content when the reference check is shown to eligible edits.

There was significant increase in the proportion of new content edits with a reference on both desktop and mobile platforms. On mobile, users are 4.2 times more likely to include a reference with their new content edit and not be reverted when the reference check is shown to eligible edits.

Registered User Experience Level

Show the code
published_edits_reference_byexp <- published_edits_reference_new_content %>%
    filter(
          user_status == 'registered') %>%  #we only track edit counts for registered users 
    group_by(experience_level_group, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
       mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reference_byexp %>%
    ggplot(aes(x= experiment_group, y = n_content_edits_wref/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~ experience_level_group) +
    scale_y_continuous(labels = scales::percent) +
     labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = "Constructive new content edits that include a new reference \n by user experience level" ,
          caption = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 6: New content edits by both newcomers and junior contributors are more likely to include a reference if reference check is shown to eligible edits

We observed increases for both newcomers (users making their first edit) and junior contributors (users with 100 or fewer edits). Newcomers are 2.7 times more likely to add a new reference to new content when reference check is shown to eligible edits. Junior contributors are 1.7 times more likely to add a new reference to new content when reference check is shown to eligible edits.

User Status

Show the code
published_edits_reference_byuserstatus<- published_edits_reference_new_content %>%
    group_by(user_status, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
    mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reference_byuserstatus%>%
    ggplot(aes(x= experiment_group, y = n_content_edits_wref/n_content_edits, fill = experiment_group)) +
    facet_grid(~user_status) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_y_continuous(labels = scales::percent) +
     labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = "Constructive new content edits\n that include a new reference by user status" ,
          caption = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Test Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 7: Similar increases in the proprtion of new content edits with a reference observed for registered and unregistered users

Similar absolute increases were observed for both registered and unregistered users. Registered users are 1.9 times more likely and unregistered users are 3 times more likely to include a new reference with their new content edit when reference check is shown to eligible edits.

Editors from Sub-Saharan Africa

Show the code
published_edits_reference_byssa <- published_edits_reference_new_content %>%
    filter(is_from_ssa == "Sub-Saharan Africa") %>%  
    group_by(is_from_ssa, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
      mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reference_byssa  %>%
    ggplot(aes(x= experiment_group, y = n_content_edits_wref/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_y_continuous(labels = scales::percent) +
     labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = "Constructive new content edits that include a new reference \n by editors from Sub-Saharan Africa" ,
          caption = "Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 8: New content edits published by newcomers and junior contributors from from Sub-Saharan Africa are 2.6 times more likely to include a new reference

New content edits published by newcomers and junior contributors from Sub-Saharan Africa are 2.6 times more likely to include a new reference. This is based on around 200 new content edits logged by editors from this region in each experiment group.

By Wiki

Show the code
published_edits_reference_bywiki <- published_edits_reference_new_content %>%
    group_by(wiki, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_content_edits_wref = n_distinct(editing_session[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
         mutate(prop_edits = paste0(round(n_content_edits_wref/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reference_bywiki %>%
    filter(!wiki %in% c('Afrikaans Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia'))  %>% 
    ggplot(aes(x= experiment_group, y = n_content_edits_wref/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~ wiki) +
    scale_y_continuous(labels = scales::percent) +
    labs (y = "Percent of new content edits ",
           x = "Experiment Group",
          title = " Constructive new content edits that include a new reference \n by partner wiki",
         caption = "Afrikaans Wikipedia, Swahili Wikipedia, Yoruba Wikipedia removed from analysis due to insufficient events. \n 
        Includes all new content edits by unregistered users or users with 100 or fewer edits not reverted within 48 hours")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        text = element_text(size=18),
        plot.title = element_text(hjust = 0.5),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 9: New content edits published by newcomers and junior contributors increased at all partner wikis.

We observed increases at all the partner wikis. The highest observed increases occurred at Arabic, Chinese ,and Portuguese Wikipedia where over half of all new content edits included a reference if reference check was presented to eligible edits.

KPI 2: Revert Rate

Hypothesis: Reference check will increase the quality of edits newcomers and Junior Contributors editing from within Sub-Saharan Africa publish in the main namespace

Methodology: We reviewed the proportion of all new content edits in the control and test groups that were reverted within 48 hours. This was identified as one of the target metrics for WE 1.2.

We limited the analysis to new content edits that met the requirements of being shown reference check at the time the contributor indicated intent to save. In the test group, contributors of these edits were shown reference check (labeled as “Test: Reference Check Shown” in the charts below). In the control group, contributors of these edits were eligible but not shown reference check (labels as “Control: Eligible for Reference Check but not shown”) in the charts below).

Overall

Show the code
published_edits_reverted_eligible <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_reverted_edits = n_distinct(editing_session[was_reverted == 1]))  %>%  #look at reverted
     mutate(prop_edits = paste0(round(n_reverted_edits/n_content_edits * 100, 1), "%"))
Show the code
published_edits_reverted_eligible_table <- published_edits_reverted_eligible %>%
  mutate(experiment_group = ifelse(experiment_group == "Control", "Control: Eligible for Reference Check but not shown", "Test: Reference Check Shown")) %>%
  gt()  %>%
  tab_header(
    title = "48-hour new content edit revert rate \n of edits eligible for reference check"
  )  %>%
  cols_label(
    experiment_group = "Experiment Group",
    n_content_edits = "Number of new content edits",
    n_reverted_edits = "Number of new content edits reverted",
    prop_edits = "Proportion of new content edits reverted"
  ) %>%
  tab_footnote(
    footnote = "Includes all new content edits by unregistered users or users with 100 or fewer edits eligible for reference check",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  ) 

published_edits_reverted_eligible_table
48-hour new content edit revert rate of edits eligible for reference check
Experiment Group Number of new content edits1 Number of new content edits reverted Proportion of new content edits reverted
Control: Eligible for Reference Check but not shown 9078 2320 25.6%
Test: Reference Check Shown 8255 1934 23.4%

1 Includes all new content edits by unregistered users or users with 100 or fewer edits eligible for reference check

Figure 10: There was a -8.6% (-2.2 pp) decrease in the revert rate of all new content edits comparing edits where edit check was shown in the test group to edits that were eligible but not shown reference check in the control group.

There was a -8.6% (-2.2 pp) decrease in the revert rate of all eligible new content edits by newcomers, junior contributors, and unregistered users. These include new content edits that may or may not have had a new reference added.

Figure 11 shows includes a breakdown by if the new content edit included a new reference.

Show the code
published_edits_reverted_eligible_ref  <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(experiment_group, includes_new_reference) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_reverted_edits = n_distinct(editing_session[was_reverted == 1]))  %>%  #look at reverted
     mutate(prop_edits = paste0(round(n_reverted_edits/n_content_edits * 100, 1), "%"))
Show the code
published_edits_reverted_eligible_ref_table <- published_edits_reverted_eligible_ref %>%
   mutate(experiment_group = ifelse(experiment_group == "Control", "Control: Eligible for Reference Check but not shown", "Test: Reference Check Shown")) %>%
  gt()  %>%
  tab_header(
    title = "48-hour new content edit revert rate of edits \n eligible for reference check 
    by if reference was included"
  )  %>%
  cols_label(
    experiment_group = "Experiment group",
    includes_new_reference = "Includes new reference",
    n_content_edits = "Number of new content edits",
    n_reverted_edits = "Number of new content edits reverted",
    prop_edits = "Proportion of new content edits reverted"
  ) %>%
  tab_footnote(
    footnote = "Includes all new content edits by unregistered users or users with 100 or fewer edits eligible for reference check",
    locations = cells_column_labels(
      columns = 'n_content_edits'
    )
  ) 

published_edits_reverted_eligible_ref_table
48-hour new content edit revert rate of edits eligible for reference check by if reference was included
Includes new reference Number of new content edits1 Number of new content edits reverted Proportion of new content edits reverted
Control: Eligible for Reference Check but not shown
No new reference included 8454 2260 26.7%
New reference included 624 60 9.6%
Test: Reference Check Shown
No new reference included 4986 1451 29.1%
New reference included 3269 483 14.8%

1 Includes all new content edits by unregistered users or users with 100 or fewer edits eligible for reference check

Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reverted_eligible_ref %>%
    ggplot(aes(x= includes_new_reference, y = n_reverted_edits/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
   facet_grid (~ experiment_group) +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_y_continuous(labels = scales::percent) +
   scale_x_discrete(labels = c("No reference included", "Reference included")) +
    labs (y = "Percent of new content edits reverted",
           x = "Experiment Group",
          title = "48-hour revert rate of new content edits \n eligible for reference check by if new reference included")  +
    scale_fill_manual(values= c("#999999", "steelblue2", "steelblue4"), name = "Experiment Group", labels = c( "Control: Eligible", "Test: Reference Check Shown"))  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 11: Reference check caused a signficant shift in the number of new content edits that were published with a new reference which led to an increase in the revert rate of new content edits with a reference; however, there was an overall increase in the quality of all new content edits as the inclusion of a reference signficantly decreases the likelihood a new content edit will be reverted.

New content edits with a reference in both groups have a much lower revert rate in both groups.

The reference check caused a significant shift in the number of new content edits in the test group that included a reference . Some edits that would have been reverted without a reference now included a reference and were still reverted.

As indicated in KPI 1 section, there was 5 percentage point increase in proportion of new content edits that include a new reference and are reverted between the test groups. We plan to further investigate the types of citations added with these references to understand more about the types of sources people are adding after being shown reference check.

However, the feature caused a higher proportion of constructive (unreverted) new content edits with a reference added (23.4 percentage point increase). As a result, we observed an overall increase in the quality of new content edits.

Platform

Show the code
published_edits_reverted_eligible_platform  <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(platform, experiment_group, is_edit_check_activated) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_reverted_edits = n_distinct(editing_session[was_reverted == 1]))  %>%  #look at unreverted
     mutate(prop_edits = paste0(round(n_reverted_edits/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reverted_eligible_platform  %>%
    ggplot(aes(x= experiment_group, y = n_reverted_edits/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~ platform) +
    scale_y_continuous(labels = scales::percent) +
    labs (y = "Percent of new content edits reverted ",
           x = "Experiment Group",
          title = "48-hour revert rate of new content edits \n eligible for reference check by platform")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group", labels = c("Control: Eligible", "Test: Reference Check Shown")) +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 12: New content edit revert rate decreased on both mobile and desktop platforms

There was a slight decrease in the revert rate of new content on both desktop and mobile platforms. The revert rate decreased by -9.4% (-1.7 pp) on desktop and on mobile it decreased by -5.9% (-2 pp).

Editors from Sub-Saharan Africa

Show the code
published_edits_reverted_eligible_ssa  <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible',
           is_from_ssa == "Sub-Saharan Africa") %>% #limit new content where edit check was shown or eligible to be shown
    group_by(experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_reverted_edits = n_distinct(editing_session[was_reverted == 1]))  %>%  #look at unreverted
     mutate(prop_edits = paste0(round(n_reverted_edits/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reverted_eligible_ssa  %>%
    ggplot(aes(x= experiment_group, y = n_reverted_edits/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_x_discrete(labels = c("Control: Eligible", "Test: Reference Check Shown" ))+
    scale_y_continuous(labels = scales::percent) +
    labs (y = "Percent of new content edits reverted ",
           x = "Experiment Group",
          title = "48-hour revert rate of new content edits \n from Sub-Saharan Africa eligible for reference check")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 13: New content edits by editors from Sub-Saharan Africa are 53 percent less likley to be reverted if shown reference check.

During the duration of the test, we logged ~150 eligible new content edits from SSA in each group. While this sample size is smaller, the observed change is high enough we we were able to confirm statistical significance. 

By User Types:

We observed decreases in revert rate across all user types:

  • Registration Status

    • Registered Users: 19.2% (control) -> 16.5% (test); -14% decrease (-2.7pp)

    • Unregistered Users: 30.4% (control) -> 28.9% (test); -4.9% decrease (-1.5pp)

  • Experience Level

    • Newcomers: 24.1% (control) -> 19.8% (test); -17.8% decrease (-4.3pp)

    • Junior Contributors:17.1% (control) -> 15.1% (test); -11.7% decrease (-2.0 pp)

By Wiki

Show the code
published_edits_reverted_bywiki <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(wiki, experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session),
             n_reverted_edits = n_distinct(editing_session[was_reverted == 1]))  %>%  #look at reverted
 mutate(prop_edits = paste0(round(n_reverted_edits/n_content_edits * 100, 1), "%"))
Show the code
dodge <- position_dodge(width=0.9)

p <- published_edits_reverted_bywiki  %>%
    filter(!wiki %in% c('Afrikaans Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia'))  %>% 
    ggplot(aes(x= experiment_group, y = n_reverted_edits/n_content_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~ wiki) +
    scale_y_continuous(labels = scales::percent) +
    labs (y = "Percent of new content edits reverted ",
           x = "Experiment Group",
          title = "48-hour revert rate of new content edits \n eligible for reference check by wiki",
         caption = "Afrikaans Wikipedia, Swahili Wikipedia, Yoruba Wikipedia removed from analysis due to insufficient events")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group", labels = c("Control (eligible)", "Test (reference check shown)"))  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        text = element_text(size=18),
        plot.title = element_text(hjust = 0.5),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 14: Revert rates decreased or there were no significant changes across all partner wikis except Vietnamese Wikipedia.

Revert rates decreased or there were no significant changes across all most all partner wikis. We observed some higher revert rate decreases at Spanish Wikipedia (-12.2% decrease), Portuguese Wikipedia ( -29.7% decrease), and Arabic Wikipedia (-20.1% decrease).

On Vietnamese Wikipedia, there was a -42.4% (-5.2pp) increase in revert rate. Exploring the revert rate for this wiki further, the revert rate for edits shown reference check at that added a reference was 20.4% compared to a revert rate of 12.2% for users not shown the reference check and that did not include a reference. Further investigation would be helpful to provide some additional insights into why these edits are being reverted.

We also identified several guardrails used to confirm that the reference check was not causing siginficant disruption.

Modeling the impact of revert rate

As the changes in revert rate was small and variable across reviewed dimensions, we used a Bayesian Hierarchical regression model to correctly infer the impact of the reference check on whether a new content edit was reverted or not and account for the random effects by the user and wiki. This allows us to confirm if the observed increase above is statistically significant (did not occur due to random chance).

Based on estimates from the model, we found there is an average -0.4% decrease (maximum 10.1% decrease) in the probability of a contributors publishing an unreverted new content edit when shown reference check. We can confirm statistical significance at the 0.05 level for all of these estimates (as indicated by credible intervals that do not cross 1.

Guardrail #1: Edit completion rate

While introducing reference check introduces an extra step in the publishing workflow causing some decrease in edit completion rate, we want to ensure it does not cause significant disruption to contributors.

Methodology: We reviewed the proportion of edits by newcomers, junior contributors, and unregistered users that click the pre-publish button to indicate intent to save (event.action = saveIntent) and successfully publish their edit (event.action = saveSuccess). We limited analysis to only edits that are not reverted within 48 hours.

In addition, we focused on edits that were shown reference check in the test group. We are unable to limit the control group to just eligible edits as that tag is only applied for published edits. Instead, we will compare all edits in Control that reach saveIntent.

We selected all edits that reach saveIntent in the editing workflow (vs the typical init action to mark the beginning of an edit attempt) as the reference check is not shown until the user selects the publish button for the first time. This removes editors that abandoned their edit prior to this point in the workflow and prior to being shown reference check.

Show the code
# load edit completion based on save intent
edit_completion_rate_save_intent <-
  read.csv(
    file = 'Queries/data/edit_completion_rate_saveIntent.csv',  
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

Overall

Show the code
edit_completes_eligible <- edit_completion_rate_save_intent %>%
    filter(experience_level <= 100) %>%
    group_by(experiment_group, was_edit_check_shown)  %>%
       summarise(n_edit_attempts = n_distinct(edit_attempt_id),
             n_edits_saved = n_distinct(edit_attempt_id[is_edit_saved == 1 & was_reverted != 1]))  %>%  
     mutate(prop_edits = paste0(round(n_edits_saved/n_edit_attempts * 100, 1), "%")) 

edit_completes_eligible <- edit_completes_eligible[-c(2),] #remove test group rows where reference check was not shown 
Show the code
dodge <- position_dodge(width=0.9)

p <- edit_completes_eligible  %>%
    ggplot(aes(x= experiment_group, y = n_edits_saved/n_edit_attempts, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_y_continuous(labels = scales::percent) +
    scale_x_discrete(labels= c("Control", "Test (Reference check shown)")) +
    labs (y = "Percent of edits completed ",
           x = "Experiment Group",
          title = "Edit completion rate from save intent to publish")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Test Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 15: Edit completion rate decreased slightly. There was a -10% decrease (-6.4pp) decrease in edit completion rate for edits where reference check was shown in the test group compared to edits in the control group that were not shown reference check.

There was a -10% decrease (-6.4pp) decrease in edit completion rate for edits where reference check was shown in the test group compared to edits in the control group that were not shown reference check.

Platform

Show the code
edit_completes_byplatform_editcheck <- edit_completion_rate_save_intent %>%
    filter(experience_level <= 100) %>%
    group_by(platform,  experiment_group, was_edit_check_shown)  %>%
      summarise(n_edit_attempts = n_distinct(edit_attempt_id),
             n_edits_saved = n_distinct(edit_attempt_id[is_edit_saved == 1 & was_reverted != 1]))  %>%  
     mutate(prop_edits = paste0(round(n_edits_saved/n_edit_attempts * 100, 1), "%"))

edit_completes_byplatform_editcheck <- edit_completes_byplatform_editcheck[-c(2, 5),] #remove test group rows where reference check was not shown 

Figure 16: There was a decrease in edit completion rate on both platforms. On mobile, edit completion rate decreased by -24.3% (-13.5pp) while on desktop it decreased by only -3.1% (-2.3pp)

We observed a more significant decrease on mobile compared to desktop. On mobile, edit completion rate decreased by -24.3% (-13.5pp) while on desktop it decreased by only -3.1% (-2.3pp).

Editors from Sub Saharan Africa

Show the code
edit_completes_byssa_editcheck <- edit_completion_rate_save_intent %>%
    filter(experience_level <= 100,
          is_from_ssa == "Sub-Saharan Africa") %>%
    group_by(is_from_ssa, experiment_group, was_edit_check_shown)  %>%
      summarise(n_edit_attempts = n_distinct(edit_attempt_id),
             n_edits_saved = n_distinct(edit_attempt_id[is_edit_saved == 1 & was_reverted != 1]))  %>%  
     mutate(prop_edits = paste0(round(n_edits_saved/n_edit_attempts * 100, 1), "%"))

edit_completes_byssa_editcheck  <- edit_completes_byssa_editcheck[-c(2),] #remove test group rows where reference check was not shown 
Show the code
# Plot edit completion rates for each wiki  
dodge <- position_dodge(width=0.9)
options(repr.plot.width = 15, repr.plot.height = 10)

p <- edit_completes_byssa_editcheck  %>%
    ggplot(aes(x= experiment_group, y = n_edits_saved/n_edit_attempts, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    scale_y_continuous(labels = scales::percent) +
    scale_x_discrete(labels= c("Control", "Test (Reference check shown)")) +
    labs (y = "Percent of edits completed ",
           x = "Experiment Group",
          title = "Edit completion rate from save intent to publish \n for editors from Sub-Saharan Africa")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Test Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 17: No significant changes in edit completion rate for editors from Sub-Saharan Africa

There were no significant changes in edit completion rate for editors from Sub-Saharan Africa overall or by platform in Sub-Saharan Africa.

By Partner Wiki

Figure 18: No significant decreases in edit completion rate at each individual partner wiki.

No significant decreases observed at each individual partner wiki.

Number of new content edits successfully saved

We can also review the impact of reference check on the total number of new content edits successfully saved by comparing the number of edits saved after being shown reference check (test) to the total number of eligible saved edits not shown reference check (control).

Show the code
num_published_edits <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(experiment_group) %>%
    summarise(n_content_edits = n_distinct(editing_session))
Show the code
num_published_edits_byplatform <- published_edits_reference_new_content %>%
    filter(is_test_eligible == 'eligible') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(platform, experiment_group, includes_new_reference) %>%
    summarise(n_content_edits = n_distinct(editing_session))
Show the code
# Plot edit completion rates for each wiki  

options(repr.plot.width = 15, repr.plot.height = 10)

p <- num_published_edits_byplatform   %>%
    ggplot(aes(x= experiment_group, y = n_content_edits, fill = includes_new_reference)) +
    geom_bar(position="stack", stat="identity") +
    geom_text(aes(label = n_content_edits), size = 5, color = "white", position = position_stack(vjust = 0.5)) +
      facet_grid(~ platform) +
    scale_x_discrete(labels= c("Control", "Test (Reference check shown)")) +
    labs (y = "Number of new content edits ",
           x = "Experiment Group",
          title = "New content edits by inclusion of reference and platform")  +
    scale_fill_manual(values= c("#999999", "#009E73"), name = "Reference Included")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 18: While the total number of new content edits decreased in the test group, the number of new content edits with a reference increased significantly on both platforms.

There was decrease in the total number of overall new content edits completed by users shown the reference check (test) compared to users eligible but not shown reference check (control). However, the number of new content edits with a reference increased significantly on both the desktop and mobile platforms as shown in Figure 18 and indicated by KPI #1 results.

Guardrail 2: Proportion of contributors blocked after publishing an edit where reference check was shown.

Methodology:

We reviewed both global and local blocks made within 6 hours of a user being shown reference check as identified in the logging table and compared to blocks made within 6 hours of a user identified as eligible but not shown edit check (control group).

Show the code
# load data
edit_check_blocks <-
  read.csv(
    file = 'Queries/data/edit_check_users_blocked.csv', 
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

edit_check_eligible_blocks <-
  read.csv(
    file = 'Queries/data/edit_check_eligible_users_blocked.csv', 
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

Overall

Only 2.9% of users were blocked after publishing an edit where reference check was shown. The is the same proportion of users blocked who published an edit identified as eligible but not shown reference check.

User Status

Figure 19: For unregistered users, we observed a -34% (-2.5 pp) decrease in the proportion of users blocked after making a new content edit. No significant changes in the block rate of registered users.

For unregistered users, we observed a -34% (-2.5 pp) decrease in the proportion of users blocked after making a new content edit. No significant changes in the block rate of registered users.

By Wiki

Figure 20: Fewer than 25 users were blocked after being shown edit check at all partner wikis. No significant differences between those shown reference check (test) compared to those eligible but not shown (control)

There were decreases or no significant changes in block rates at any of the partner Wikipedias.

Guardrail 3: Proportion of new content edits published without a reference and without being shown edit check

This metric serves as our false negative indicator.

Methodology: We reviewed the proportion of all new content edits (editcheck-newcontent) by users with 100 or fewer edits or unregistered users that were published without a new reference (no editcheck-newreference tag) and without being shown reference check (editcheck-references-activated).

As a baseline, we compared this to the proportion new content edits tagged as being eligible in the control group (editcheck-references) and not published with a new reference as we’d expect those two proportions to be similar.

Overall

Only 1.8% of all published new content edits in the test group did not include a new a reference and were not shown edit check. This is comparable to the 1.3% of published new content edits in the control group ,

Show the code
published_users_noreference_overall <- published_edits_reference_new_content %>%
    group_by(experiment_group, is_test_eligible, includes_new_reference) %>%
    summarise(n_edits = n_distinct(editing_session))  %>%  #look at unreverted references
    group_by(experiment_group) %>% 
    mutate(prop_edits = paste0(round( n_edits/sum(n_edits) * 100, 1), "%"))
Experiment Group Number of new content edits Proportion of new content edits
Control: Not eligible and no new reference added 129 1.3%
Test: No reference check shown and no new reference added 164 1.8%

By Platform

On desktop, 2.2% of new content edit were published without a reference and without being shown reference check compared to 1.9% of new content edits published with a reference and without being identified as eligible in the control group.

The false negative rate was slightly lower on mobile. 1.4% of new content edits were published without a reference and without being shown reference check.

Show the code
published_users_noreference_platform <- published_edits_reference_new_content %>%
    group_by(experiment_group, platform, is_test_eligible, includes_new_reference) %>%
    summarise(n_edits = n_distinct(editing_session))  %>%  #look at unreverted references
    group_by(platform, experiment_group) %>% 
    mutate(prop_edits = paste0(round( n_edits/sum(n_edits) * 100, 1), "%"))
Platform Experiment Group Number of new content edits Proportion of new content edits without a reference
Desktop Control: Not eligible and no new reference added 130 1.9%
Test: No reference check shown and no new reference added 140 2.2%
Mobile Control: Not eligible and no new reference added < 50 0.3%
Test: No reference check shown and no new reference added < 50 1.1%

By Wiki

At almost all partner wikis, the false negative rate was less than 2.2%. Japanese Wikipedia had the highest observed false negative rate of 3.2% (3.2% of published new content edits were not shown reference check and did not include a reference)

Guardrail 4: Proportion of contributors that dismiss adding a citation and indicate the information they are adding does not need a reference as the reason for doing so.

This metric serves as our false positive indicator.

Methodology:

We reviewed the proportion of edits where reference check was shown and the contributor dismissed the citation by explicitly indicating that the information they are adding does not need a reference. This is determined in the data by reviewing published edits with editcheck-reference-decline-irrelevant tag.

Note this metric relies on users explicitly selecting an option. It does not account for instances where the reference check was shown in error and the user did not select one of the provided options for declining to add a reference.

See available edit tags for documentation of decline options.

Show the code
edit_check_declines <-
  read.csv(
    file = 'Queries/data/edit_check_declines_v2.csv',
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

Overall

Show the code
edit_check_decline_overall <- edit_check_declines %>%
    filter(is_edit_check_activated == 1,
          experiment_group == '2024-02-editcheck-reference-test') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(experiment_group) %>%
    summarise(n_edits = n_distinct(editing_session),
             n_edits_decline = n_distinct(editing_session[decline_other == 1| decline_common_knowledge == 1| delince_irrelevant == 1| decline_uncertain == 1]),
             )  %>%  #look at unreverted references
     mutate(prop_users = paste0(round( n_edits_decline/n_edits * 100, 1), "%"))

edit_check_decline_overall
# A tibble: 1 × 4
  experiment_group                 n_edits n_edits_decline prop_users
  <chr>                              <int>           <int> <chr>     
1 2024-02-editcheck-reference-test   10100            5234 51.8%     

The user added an explicit reason for declining to add a reference for 51.8% of all new content edits where reference check were shown. There was a slightly higher rate of reference checks declined on mobile (58%) compared to desktop (44.3%).

Show the code
# overall by type
edit_check_decline_overall_bytype <- edit_check_declines %>%
    filter(is_edit_check_activated == 1,
          experiment_group == '2024-02-editcheck-reference-test') %>% #limit new content where edit check was shown or eligible to be shown
    summarise(n_edits = n_distinct(editing_session),
            decline_uncertain = n_distinct(editing_session[decline_uncertain == 1]),
             decline_other = n_distinct(editing_session[decline_other == 1]),
             decline_common_knowledge = n_distinct(editing_session[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(editing_session[delince_irrelevant == 1]),
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
    mutate (prop_users = paste0(round(n_decline_edits/n_edits *100, 1), "%"))
Show the code
edit_check_decline_overall_bytype_table <- edit_check_decline_overall_bytype %>%
  select(-1) %>%
  gt()  %>%
  tab_header(
    title = "Published edits where reference check was shown and declined by decline option selected"
  )  %>%
  cols_label(
    decline_reason = "Decline reason",
    n_decline_edits = "Number of edits that included decline option",
    prop_users = "Proportion of edits that included decline option"
  ) %>%
  tab_footnote(
    footnote = "Includes all published edits by unregistered users or users with 100 or fewer edits that were shown reference check",
    locations = cells_column_labels(
      columns = 'n_decline_edits'
    )
  ) 

edit_check_decline_overall_bytype_table
Published edits where reference check was shown and declined by decline option selected
Decline reason Number of edits that included decline option1 Proportion of edits that included decline option
decline_uncertain 1507 14.9%
decline_other 1777 17.6%
decline_common_knowledge 1286 12.7%
decline_irrelevant 664 6.6%

1 Includes all published edits by unregistered users or users with 100 or fewer edits that were shown reference check

6.6 percent of users that added an explicit reason for declining a reference indicated that they were declining because the information they are adding does not need a reference (i.e. it was irrelevant).

The irrelevant decline option was the least frequently selected option by users that declined the reference check. Most users selected “Other” as their reason for declining.

By Platform

Show the code
edit_check_decline_platform_bytype <- edit_check_declines %>%
    filter(is_edit_check_activated == 1,
          experiment_group == '2024-02-editcheck-reference-test') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(platform) %>%
    summarise(n_edits = n_distinct(editing_session),
            decline_uncertain = n_distinct(editing_session[decline_uncertain == 1]),
             decline_other = n_distinct(editing_session[decline_other == 1]),
             decline_common_knowledge = n_distinct(editing_session[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(editing_session[delince_irrelevant == 1])
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
     mutate(prop_edits = paste0(round( n_decline_edits/n_edits * 100, 1), "%"))
Show the code
# Plot edit completion rates for each wiki  
options(repr.plot.width = 18, repr.plot.height = 12)
dodge <- position_dodge(width=0.9)

p <-edit_check_decline_platform_bytype %>%
    ggplot(aes(x= decline_reason, y = n_decline_edits/n_edits)) +
    geom_col(aes(alpha = decline_reason == 'decline_irrelevant'), position = 'dodge', fill = "#0072B2") +
    scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0.2)) +
    facet_wrap(~platform)+
    scale_x_discrete (labels = c("Common", "Irrelevant", "Other", "Uncertain")) +
    scale_y_continuous(labels = scales::percent) +
      geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of reference checks declined ",
           x = "Decline citation reason",
          title = "Proportion of edits where edit check was shown \n and declined by platform")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=14),
        legend.position= "none",
        axis.line = element_line(colour = "black"))
      
p

Figure 21: Results were similar on desktop and mobile. Users indicated that the reference check was not relevant at little over 6% of edits shown reference check on both desktop and phone.

Results were similar on desktop and mobile. Users indicated that the reference check was not relevant at little over 6% of edits shown reference check on both desktop and phone. This was the least frequently selected reason on both platforms.

Editors from Sub Saharan Africa

Show the code
edit_check_decline_ssa_bytype <- edit_check_declines %>%
    filter(is_edit_check_activated == 1,
          experiment_group == '2024-02-editcheck-reference-test',
          is_from_ssa == "sub_saharan_africa") %>% #limit new content where edit check was shown or eligible to be shown
    summarise(n_edits = n_distinct(editing_session),
            decline_uncertain = n_distinct(editing_session[decline_uncertain == 1]),
             decline_other = n_distinct(editing_session[decline_other == 1]),
             decline_common_knowledge = n_distinct(editing_session[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(editing_session[delince_irrelevant == 1]),
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
      mutate(prop_edits = paste0(round( n_decline_edits/n_edits * 100, 1), "%"))
Show the code
# Plot edit completion rates for each wiki  
options(repr.plot.width = 18, repr.plot.height = 12)
dodge <- position_dodge(width=0.9)

p <-edit_check_decline_ssa_bytype %>%
    ggplot(aes(x= decline_reason, y = n_decline_edits/n_edits)) +
    geom_col(aes(alpha = decline_reason == 'decline_irrelevant'), position = 'dodge', fill = "#0072B2") +
    scale_alpha_manual(values = c("TRUE" = 1, "FALSE" = 0.2)) +
    scale_y_continuous(labels = scales::percent) +
      scale_x_discrete (labels = c("Common knowledge", "Irrelevant", "Other", "Uncertain")) +
      geom_text(aes(label = paste(prop_edits), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of reference checks declined ",
           x = "Decline citation reason",
          title = "Proportion of edits from Sub-Saharan Africa \n where reference check was shown and declined")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 22: Only 1.7% of edits where reference check was shown to editors from Sub Saharan Africa were declined because the user indicated that the information they were adding does not need a reference.

Only 1.7% of edits where reference check was shown to editors from Sub-Saharan Africa were declined because the user indicated that the information they were adding does not need a reference.

The most frequently selected reason for declining to add a reference in this region was that the information they are adding is common knowledge. This differs from overall trends where the most commonly selected decline reasons were either “other” or “uncertain”.

By Wiki

Show the code
edit_check_decline_wiki_bytype <- edit_check_declines %>%
    filter(is_edit_check_activated == 1,
          experiment_group == '2024-02-editcheck-reference-test') %>% #limit new content where edit check was shown or eligible to be shown
    group_by(wiki) %>%
    summarise(n_edits = n_distinct(editing_session),
            decline_uncertain = n_distinct(editing_session[decline_uncertain == 1]),
             decline_other = n_distinct(editing_session[decline_other == 1]),
             decline_common_knowledge = n_distinct(editing_session[decline_common_knowledge == 1]),
             decline_irrelevant = n_distinct(editing_session[delince_irrelevant == 1]),
             )  %>% 
    pivot_longer(cols = contains('decline'), names_to = "decline_reason", values_to = "n_decline_edits") %>% 
   mutate(prop_edits = paste0(round( n_decline_edits/n_edits * 100, 1), "%"))  %>% 
    filter(decline_reason == "decline_irrelevant")
Partner Wiki Proportion of edits where reference check was shown and the editor indicated no new information added
Arabic Wikipedia 10.9%
Chinese Wikipedia 6.8%
French Wikipedia 3.2%
Italian Wikipedia 11.1%
Japanese Wikipedia 6.1%
Portuguese Wikipedia 8.6%
Spanish Wikipedia 6.1%
Vietnamese Wikipedia 7.1%

False positive rates are below 11% at each partner wiki. Note: Afrikaans Wikipedia, Swahili Wikipedia, Yoruba Wikipedia removed from analysis due to insufficient events.

Curiosity #1: Proportion of newcomers and junior contributors that publish at least one new content edit that includes a reference

Hypothesis: Newcomers and Junior Contributors will be more aware of the need to add a reference when contributing new content because the visual editor will prompt them to do so in cases where they have not done so themselves.

Methodology

This metric is similar to KPI 1 except that it look at proportion of distinct editors versus distinct edits.There were no significant differences to the results reported in KPI 1 as the majority of newcomers and Junior Contributors posted just one new content edit during the reviewed time period. See overall results below. Analysis excludes edits reverted within 48 hours.

Show the code
published_users_reference_overall <- published_edits_reference_new_content %>%
    group_by(experiment_group) %>%
    summarise(n_users_edits = n_distinct(user_id),
             n_users_edits_wref = n_distinct(user_id[includes_new_reference == "New reference included" & was_reverted == 0]))  %>%  #look at unreverted references
     mutate(prop_users = paste0(round( n_users_edits_wref/n_users_edits * 100, 1), "%"))
Show the code
# Plot edit completion rates for each wiki  

dodge <- position_dodge(width=0.9)

p <- published_users_reference_overall %>%
    ggplot(aes(x= experiment_group, y = n_users_edits_wref/n_users_edits, fill = experiment_group)) +
    geom_col(position = 'dodge') +
      geom_text(aes(label = paste(prop_users), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of distinct users",
           x = "Experiment Group",
          title = "Newcomers, junior contributors, and unregistered users \n that add new content with a reference")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Edit Check Activated")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "none",
        axis.line = element_line(colour = "black")) 
      
p

Figure 23: There was 2.5x increase the proportion of contributors that added a new content with a reference (unreverted) during the AB test.

There was 2.5x increase the proportion of newcomers and junior contributors that added a new content with a reference (unreverted) during the AB test.

Curiosity 2: Constructive Retention Rate

Methodology:

We reviewed the proportion of registered newcomers and junior contributors who successfully published an unreverted edit after being shown reference check and returned 31 to 60 days after (second month) to make another unreverted edit. The analysis was limited to registered users.

We compared these retention rates to the rates of editors that made an edit identified as eligible in the control group but not shown reference check.

Show the code
# load test retention
test_retention_rate <-
  read.csv(
    file = 'Queries/data/test_retention_data.csv',  
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 

# load control group retention
control_retention_rate <-
  read.csv(
    file = 'Queries/data/control_retention_data.csv',  
    header = TRUE,
    sep = ",",
    stringsAsFactors = FALSE
  ) 
Show the code
# combine datasets for further exploration
retention_rate <- rbind(test_retention_rate, control_retention_rate)
Show the code
#clarfiy wiki names
retention_rate <- retention_rate %>% 
  mutate(
    wiki = case_when(
      #clarfiy participating project names
      wiki == 'arwiki' ~ "Arabic Wikipedia", 
      wiki == 'afwiki' ~ "Afrikaans Wikipedia", 
      wiki == 'eswiki' ~ "Spanish Wikipedia",  
      wiki == 'frwiki' ~ "French Wikipedia", 
      wiki == 'itwiki' ~ "Italian Wikipedia", 
      wiki == 'jawiki' ~ "Japanese Wikipedia",
      wiki == 'ptwiki' ~ "Portuguese Wikipedia",
      wiki == 'swwiki' ~ "Swahili Wikipedia", 
      wiki == 'yowiki' ~ "Yoruba Wikipedia", 
      wiki == 'viwiki' ~ "Vietnamese Wikipedia",
      wiki == 'zhwiki' ~ "Chinese Wikipedia", 
    )
  )

Overall

Show the code
second_month_retention_overall <- retention_rate %>%
    filter(user_status == 'registered') %>%
    group_by(experiment_group)  %>%
    summarise(return_editors = sum(return_editors),
              editors = sum(editors),
        retention_rate = paste0(round(return_editors/editors * 100, 1), "%"))
Show the code
second_month_retention_overal_table <- second_month_retention_overall  %>%
  gt()  %>%
  tab_header(
    title = "Constructive second month retention rate"
  )  %>%
  cols_label(
    experiment_group = "Experiment group",
    return_editors = "Number of editors that returned second month",
    editors = "Number of first month editors",
    retention_rate = "Retention rate"
  ) %>%
  tab_footnote(
    footnote = "Includes all unreverted edits by unregistered users or users with 100 or fewer edits eligible for reference check",
    locations = cells_column_labels(
      columns = 'retention_rate'
    )
  ) 

second_month_retention_overal_table
Constructive second month retention rate
Experiment group Number of editors that returned second month Number of first month editors Retention rate1
Control (eligible but reference check not shown) 59 867 6.8%
Test (reference check shown) 71 900 7.9%

1 Includes all unreverted edits by unregistered users or users with 100 or fewer edits eligible for reference check

Figure 24: Contributors that are shown reference check and successfully save an edit are 16 percent more likely to return to make an unreverted edit in their second month.

7.9% of newcomers and junior contributors that successfully saved an edit (unreverted) after being shown reference check returned to publish another unreverted edit 31-60 days after. This is a 16.2% increase (1.1 pp) compared to users in the control group that would have been shown reference check.

By Platform

Show the code
second_month_retention_platform <- retention_rate %>%
    filter(user_status == 'registered') %>%
    group_by( platform, experiment_group)  %>%
    summarise(return_editors = sum(return_editors),
              editors = sum(editors),
        retention_rate = paste0(round(return_editors/editors * 100, 1), "%"))
Show the code
# Plot edit completion rates for each wiki  
options(repr.plot.width = 18, repr.plot.height = 12)
dodge <- position_dodge(width=0.9)

p <-second_month_retention_platform %>%
    ggplot(aes(x= experiment_group, y = return_editors/editors, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    facet_wrap(~platform)+ 
    scale_y_continuous(labels = scales::percent) +
      geom_text(aes(label = paste(retention_rate), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of users retained ",
           x = "Experiment Group",
          title = "Constructive second month retention rate \n for edits eligible for reference check by platform")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), labels =c("Control (eligible)", "Test (reference check shown)"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.title = element_text(hjust = 0.5),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        text = element_text(size=18),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 25: Second month retention rate increase was specifically observed on desktop.

Specifically, we observed this increase on desktop where there was a 23.9% (1.7 pp) increase in second month retention. On mobile, there was non-statistically significant decrease on mobile.

Editors from Sub-Saharan Africa

Show the code
second_month_retention_ssa <- retention_rate %>%
 filter(user_status == 'registered') %>%
    group_by( is_from_ssa, experiment_group)  %>%
    summarise(return_editors = sum(return_editors),
              editors = sum(editors),
        retention_rate = paste0(round(return_editors/editors * 100, 1), "%"))

There were no significant changes in retention rate in Sub-Saharan Africa. Users from this region represent about only 2% of all users in the experiment so we had a smaller sample size to review and are unable to confirm any statistically significant changes in retention over the reviewed time frame.

By Wiki

Show the code
second_month_retention_wiki <- retention_rate %>%
    filter(user_status == 'registered') %>%
    group_by(wiki, experiment_group)  %>%
    summarise(return_editors = sum(return_editors),
              editors = sum(editors),
        retention_rate = round(return_editors/editors, 3))
Show the code
dodge <- position_dodge(width=0.9)

p <- second_month_retention_wiki  %>%
    filter(!wiki %in% c('Afrikaans Wikipedia', 'Swahili Wikipedia', 'Yoruba Wikipedia'))  %>% 
    ggplot(aes(x= experiment_group, y = return_editors/editors, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    geom_text(aes(label = paste0(retention_rate* 100, "%"), fontface=2), vjust=1.2, size = 10, color = "white") +
    facet_wrap(~ wiki) +
    scale_y_continuous(labels = scales::percent) +
    labs (y = "Proportion of users retained",
           x = "Experiment Group",
          title = "Constructive second month retention rate \n for edits eligible for reference check by partner wiki",
         caption = "Afrikaans Wikipedia, Swahili Wikipedia, Yoruba Wikipedia removed from analysis due to insufficient events")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        text = element_text(size=20),
        plot.title = element_text(hjust = 0.5),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 26: Per wiki results were variable. We observed second month retention rate increases at 6 of the partner wikis where reference check was shown.
  • Retention rates vary for each partner wiki in the test. We observed increase in retention across the 6 of the partner wikis. 

  • Significant increase in retention at Vietnamese Wikipedia for users shown the reference check (+ 7.4 pp). This increase was primarily observed on desktop. There were minimal users that were tagged as eligible for reference check on mobile on this Wikipedia.

  • Conversely, there was significant decrease in retention at Japanese Wikipedia (-8.8 pp). This decrease was observed on both desktop and mobile platforms.

Curiosity 2: Constructive New Content Retention Rate

Hypothesis: Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that includes a reference because Edit Check will have caused them to realize references are required when contributing new content to Wikipedia.

Methodology: We reviewed the proportion of registered newcomers and Junior Contributors that published an unreverted new content edit after being shown reference check and returned to make an unreverted new content edit with a reference to a main namespace during the identified retention period (31 to 60 days after). Analysis limited to registered users.

These edits were compared to all edits by users in the control group that were eligible but not shown reference check.

Overall

Figure 27: There was a slight (1 percentage point) increase in the constructive rentention rate of users shown reference check. This increase is not statistically significant.

1.6% of users shown reference check and published an unreverted new content edit returned to publish an unreverted new content edit with a reference. Comparatively, only 1.2% of users in the control group not shown reference check when making an eligible new content edit returned to make a new content edit with a reference. This represents a 1 percentage point increase.

Note: Overall, there were limited number of contributors that returned to make a new content edit with reference during their second month so we are unable to confirm statistical significance for this metric.

By Platform

Show the code
second_month_const_retention_platform <- retention_rate_const %>%
    filter(user_status == 'registered') %>%
    group_by(platform, experiment_group)  %>%
    summarise(return_editors = sum(return_editors),
              editors = sum(editors),
        retention_rate = paste0(round(return_editors/editors * 100, 1), "%"))
Show the code
options(repr.plot.width = 18, repr.plot.height = 12)
dodge <- position_dodge(width=0.9)

p <-second_month_const_retention_platform %>%
    ggplot(aes(x= experiment_group, y = return_editors/editors, fill = experiment_group)) +
    geom_col(position = 'dodge') +
    facet_wrap(~platform)+
    scale_y_continuous(labels = scales::percent) +
      geom_text(aes(label = paste(retention_rate), fontface=2), vjust=1.2, size = 10, color = "white") +
    labs (y = "Percent of users retained ",
           x = "Experiment Group",
          title = "Constructive new content retention rate by platform")  +
    scale_fill_manual(values= c("#999999", "steelblue2"), name = "Experiment Group")  +
    theme(
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.text.x=element_blank(),
        plot.title = element_text(hjust = 0.5),
        text = element_text(size=18),
        legend.position= "bottom",
        axis.line = element_line(colour = "black")) 
      
p

Figure 28: The increase in constructive retention rate primarily occurred on desktop. 3% of users shown reference check on desktop returned to successfully publish a new content edit with a reference that was not reverted in their second month. No users returned to make a new content edit their second month on mobile.

The increase in constructive retention rate primarily occurred on desktop.

No constructive retention rate was observed on mobile. Only 0.4% of newcomers and junior contributors on mobile returned to make a new content edit with a reference within 31 to 60 days.

We were also unable to confirm any statically significant changes on any of the individual wikis. Future analyses might provide insights into longer term impacts on new content retention rate once sufficient data is available.