Inspired by practice, backed by research

Years in the making: how we conceived, developed and validated the TeamHive 360

Want to jump straight into the hard details? Download the TeamHive 360 validation report by the University of Newcastle.

Research experience - cliff climbing

Iterative experience and reflection through 20 years of practice

Initial findings on the efficacy of leadership interventions

An extensive, decade-long focus on individual leadership development yielded inconsistent results regarding sustained organizational transformation. While individual leadership capabilities were often enhanced, these interventions did not reliably translate into improved collective team performance. The primary objective, enabling a team to become more than the sum of its individual parts, achieving true synergy, was met with variable success.

Experiments with team-level interventions

In response to the limitations of individual-focused programs, a strategic pivot was made toward interventions targeting the executive or senior team as a single, cohesive unit. This approach proved to be substantially more effective. By focusing on the dynamics, processes, and collective efficacy of the leadership team, it was possible to generate significant positive shifts. These improvements at the team level were observed to have a direct and positive correlation with broader organizational metrics, including enhancements in organizational culture, employee engagement scores, and key business results.

Scalability challenges

Despite the demonstrated success of intensive team interventions, significant challenges emerged in relation to scalability. The considerable time commitment required from both facilitators and participants, coupled with substantial budgetary constraints, limited the broad application of this model deeper within the organizational hierarchy. Consequently, while the senior leadership team's effectiveness improved, the impact often failed to permeate throughout the organization, leaving many unhelpful and counterproductive micro-cultures unaddressed.

Pilot study demonstrating efficacy of team intervention

To better understand the relative effectiveness of different developmental modalities, a pilot research study was conducted within a financial services company concurrently with an organization-wide culture initiative. The study was designed to measure longitudinal changes in behavior and capability resulting from four distinct interventions: leadership workshops, peer coaching, individual coaching, and team coaching. The data indicated that while all four methods were effective in producing a positive shift over time, team coaching demonstrated a substantially greater and more durable impact on the long-term behavior change of both individuals and their respective teams.

Highlighting importance of identifying a team's "core issue"

A qualitative analysis of the intervention processes revealed a critical insight: the most significant and rapid progress consistently occurred after a team had successfully identified and confronted its "core issue", the central, underlying obstacle to its effectiveness. Once this fundamental reality was acknowledged, the subsequent facilitation of change and development proceeded at an accelerated pace. This observation led to a new research question: How can a methodology be developed to systematically and efficiently guide teams and coaches to identify this core issue, thereby accelerating meaningful and lasting transformation?

Research ingredients - lightbulb

Question generation based on practice and research

Generation of candidate items for the TeamHive 360 comprised an inductive phase based on practitioner experience and theoretical foundations, and a deductive phase driven by researchers at the University of Newcastle.

Theoretical foundations and conceptual framework

The conceptual framework underpinning this research was developed over a decade of intensive study and practice from 2014 to 2024. The foundation is an integration of several key theoretical models. These include Heifetz and Linsky's principles of adaptive leadership, the adult developmental theories of Kegan and Lahey, particularly their work on Immunity to Change and the concept of Deliberately Developmental Organizations, and Peter Hawkins' models of systemic leadership team coaching. This theoretical synthesis was further enriched by participation in advanced programs and extensive scholarly review.

Psychometric question generation methodology

For the development of the TeamHive 360 diagnostic instrument, a mixed-methods approach to question generation was employed, combining both inductive and deductive procedures. This dual process is widely regarded as a best practice in psychometric scale development, as it ensures that the resulting questions possess both practical relevance and robust theoretical grounding. The inductive phase of question generation was practitioner-driven, drawing directly from the accumulated experience and research learnings from the preceding decade. Raw data and qualitative observations from in-practice team interventions were systematically analyzed to derive themes and behavioral indicators, which were subsequently formulated into an initial pool of test questions. Concurrently, a deductive process was conducted under the leadership of a research team from the University of Newcastle, Australia. This phase involved an extensive and systematic review of the relevant academic literature to identify established constructs associated with team effectiveness. Furthermore, a thorough assessment of existing and validated measurement scales was performed to ensure content validity and to identify potential gaps not addressed by current instruments.

Expert review panel - professionals reviewing documents

Expert Panel Review and Content Validation

Following the initial question generation, a formal content validation study was conducted. An independent panel comprising eight subject matter experts was convened. These experts were selected based on their recognized expertise across the domains of leadership development, human resources, and psychometrics. The panel was tasked with evaluating the entire pool of 100 draft questions to establish content validity.

Evaluation criteria and quantitative analysis

The expert review process was structured around two primary evaluation criteria for each question: clarity and relevance. Clarity was defined as the extent to which a question is worded in a simple, unambiguous manner that is easily comprehensible to the intended respondent. Relevance was defined as the degree to which a question accurately and pertinently measures the specific theoretical construct it was designed to assess. Utilising the ratings provided by the panel, researchers at the University of Newcastle performed a quantitative analysis to calculate a quality score for each question. This statistical procedure provided an objective measure of content validity, allowing for a systematic and evidence-based approach to question selection and refinement.

Iterative question refinement and qualitative feedback

The results of the quantitative analysis directly informed the question refinement process. Questions that were flagged by the expert panel as lacking clarity were systematically rewritten based on the specific feedback provided. Any question that was rated as having low relevance to its intended construct was removed from the instrument entirely. In addition to the quantitative ratings, the experts were invited to provide open-ended written comments. This qualitative feedback was instrumental in refining the precise wording of questions and, in some cases, led to the creation of new questions to address conceptual gaps identified by the panel. This iterative process of review and revision was critical for enhancing the overall quality and psychometric soundness of the final instrument.

Psychometric validation - professionals working in office

Psychometric validation

To ensure TeamHive 360 is fit for organisational decision-making, the tool underwent a comprehensive validation study by researchers at the University of Newcastle's School of Psychological Sciences. A 160-question bank from the Expert Panel Review and Content Validation was tested across 500 working adults.

Validation sampling methodology

This research involved recruitment of 500 full-time working adults across more than 12 industry sectors, including government, health, technology, and finance. Participants were required to be currently employed in team-based roles with peer-level colleagues. The sample demonstrated strong diversity across industries (12+ sectors including education, government, health, technology, and finance), team sizes (2-5 members: 35.2%; 6-10 members: 38.4%; 11-20 members: 18.4%; 21+ members: 8%), work arrangements (hybrid: 51.8%; co-located: 42%; remote: 6.1%), and organisational levels from frontline to senior leadership. The average age was 36.27 years (SD = 10.17), with 50.8% female and 48.8% male participants.

Validation results

The validation process utilised Exploratory and Confirmatory Factor Analysis to refine a 160-question bank down to a robust 63-question validated core. Results confirmed a superior fit for the four-factor model (Purpose, Learning, Shared Leadership, and Unity), with all goodness-of-fit indices (CFI = .952, TLI = .95) exceeding the thresholds for excellence.

Internal Consistency Reliability

Each of the four dimensions demonstrated internal consistency (Cronbach’s alpha) between .90 and .94, exceeding industry standards for individual and team diagnostics (Nunnally & Bernstein, 1994). These reliability coefficients support the use of TeamHive 360 for individual team diagnostics, organisational benchmarking, and monitoring team development progress over time.

Predictive Validity

The TeamHive 360 dimensions explained 60% of the total variance in team effectiveness and 37% of the variance in team performance. The Learning dimension was particularly important for predicting team effectiveness, while Purpose was the strongest predictor of team performance. All four TeamHive 360 dimensions made significant contributions to predicting both outcomes.

Diagnostic Value

The tool demonstrated significant incremental validity. Hierarchical multiple regression analyses demonstrated that TeamHive 360 dimensions explained substantial variance in team outcomes beyond demographic variables and established individual leadership measures. These results provide further evidence for discriminant validity by demonstrating that TeamHive 360 captures unique team-level variance not redundant with individual leadership constructs. The substantial incremental validity also supports the practical utility of TeamHive 360, indicating that team leadership diagnostics provides explanatory power for team outcomes beyond what can be achieved through individual leadership diagnostics alone.

To learn more, download the TeamHive 360 validation report by the University of Newcastle.

References

  • Almanasreh, E., Moles, R., & Chen, T. F. (2019). Evaluation of methods used for estimating content validity. Research in Social and Administrative Pharmacy

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

  • Bartlett, M. S. (1954). A note on the multiplying factors for various χ2 approximations. Journal of the Royal Statistical Society. Series B, Methodological

  • Bass, B. M., & Avolio, B. J. (1995). Multifactor Leadership Questionnaire (MLQ).

  • Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quinonez, H. R., & Young, S. L. (2018). Best practices for developing and validating scales for health, social, and behavioral research: A primer. Frontiers in Public Health

  • Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford Publications.

  • Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods

  • Hawkins, P. (2017) Leadership team coaching: developing collective transformational leadership. 3rd edn. London: Kogan Page.

  • Heifetz, R.A., Grashow, A. and Linsky, M. (2009) The practice of adaptive leadership: tools and tactics for changing your organization and the world. Boston, MA: Harvard Business Press.

  • Heifetz, R.A. and Linsky, M. (2017) Leadership on the line: staying alive through the dangers of leading. Rev. and updated ed. Boston, MA: Harvard Business Review Press.

  • Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement

  • Hu, L.-t., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling

  • Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika

  • Kegan, R. and Lahey, L.L. (2001) "The real reason people won't change", Harvard Business Review, 79(10), pp. 84–92.

  • Kegan, R. and Lahey, L.L. (2009) Immunity to change: how to overcome it and unlock the potential in yourself and your organization. Boston, MA: Harvard Business Press.

  • Kegan, R. and Lahey, L.L. (2016) An everyone culture: becoming a deliberately developmental organization. Boston, MA: Harvard Business Review Press.

  • Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). Guilford Press.

  • MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods

  • Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., Bouter, L. M., & de Vet, H. C. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology

  • Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

  • Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health

Instrument validation - data analysis on laptop