CheckTheGap Logo CheckTheGap

CheckTheGap Documentation

CheckTheGap is a public data dashboard tracking the gender composition of faculty at Indian universities and research institutions. It is a product of STEMTheGap, a research initiative at IIT Delhi focused on gender equity in Indian academia.

The dashboard draws on VIDWAN, India's national researcher profile database maintained by INFLIBNET under the Ministry of Education, to publish institution and department-level statistics on female faculty share across ranks, disciplines, institution types, and states. The 2025 dataset covers approximately 89,000 VIDWAN-registered faculty across 500+ institutions, built from a scrape of 224,166 public profiles conducted in October 2025.

This is the first initiative of its kind in India to publish faculty gender statistics at this scale and granularity. To get in touch or report an issue, contact us at iitdcoimpact@gmail.com.

Data Source

Data for the dashboard is sourced from VIDWAN (vidwan.inflibnet.ac.in), a national database of researchers and faculty. VIDWAN is developed and maintained by the Information and Library Network Centre (INFLIBNET), an inter-university centre of the UGC with funding under the National Mission on Education of the Ministry of Education, Government of India. The portal was launched in 2012, consolidating two earlier expert registries going back to 1999, and currently hosts over 350,000 profiles.

Profiles are created and maintained by faculty themselves or by institutional nodal officers. Each profile includes employment history, education history, a Web of Science subject area tag, and links to external identifiers such as ORCID, Google Scholar, and Scopus. VIDWAN identifies four methods to populate the database: policy direction from funding bodies, institutional nomination through a nodal officer, direct invitation from INFLIBNET, and voluntary self-registration by the expert.

VIDWAN has no API or bulk export. We scraped 224,166 public profiles in October 2025. After cleaning and applying our data restrictions, the dataset used in this dashboard covers approximately xx,000 individuals across xx+ institutions. We do not publish or redistribute individual profile data. The dashboard shows only institution and department-level aggregates, and a minimal name search that links back to the original VIDWAN profile.

How the Dataset is Built

The core of the dataset is a panel built from employment histories. Each employment entry on a VIDWAN profile carries a date range (e.g. "2012-2018", "2019-Present"), a position title, a department, and an institution name. We expand every entry into one row per year: a faculty member at IIT Delhi from 2015 to Present contributes one row for each year from 2015 to 2025. "Present" is mapped to 2025, the year of the current scrape.

The panel runs from 2010 to 2025. Years before 2010 are excluded due to sparse coverage. A faculty member who moved institutions appears at their previous institution in years before the move and at their new institution from the move year onward. Where a person holds multiple positions in the same year, they are counted once at their most senior rank after position cleaning and classification. Annual counts are deduplicated by VIDWAN ID.

To ensure consistency, we clean all profile data before aggregating. Same institutions appears under different name variants; position titles range from standard to abbreviated to misspelled; department names differ across profiles at the same institution. We clean all four dimensions before aggregating.

Institutions

We compiled a master reference list from the UGC's published lists of Central, Deemed, Private, and State universities and Institutes of National Importance, supplemented with colleges and R&D institutes from VIDWAN's own search filters. Each raw institution name in the panel is fuzzy-matched against this list to produce candidate matches, which are then passed through a two-step verification: automated scoring followed by a classification pass to accept or reject each match. The full mapping was then manually reviewed. Records that could not be confidently matched are excluded from all dashboard aggregates.

Positions

Raw position titles in VIDWAN profiles are inconsistent, with the same role appearing under many different spellings and variants. Titles were manually mapped to nine categories: Assistant Professor, Associate Professor, Professor, Lecturer, Visiting/Contract, Scientist/Researcher, Post-doctoral, Admin, and Other. The dashboard includes only the four teaching position categories: Assistant Professor, Associate Professor, Professor, and Lecturer. All other categories are excluded.

Expertise

VIDWAN profiles carry a Web of Science subject area label. We map these to seven broad fields: Engineering & Technology, Science, Mathematics, Medicine & Health, Humanities & Social Sciences, Agriculture, and Other. The STEM flag covers Engineering & Technology, Science, and Mathematics. Some labels are ambiguous across fields; for example, a label like "Magnetic Resonance Imaging" could sit in Medicine or Engineering depending on the researcher's context. To resolve these cases, classification is done using both the subject area label and the researcher's department name together using a LLM.

Departments

Department names are standardised within each institution: near-identical variants are merged, recognisable abbreviations are expanded, and genuinely distinct units are kept separate. Departments that cannot be resolved are excluded from department level views, but their faculty still contribute to institutional totals.

Methodology

Core Metric

The primary metric throughout the dashboard is female share: the percentage of female faculty among all faculty with a known gender value at the selected institution, department, rank, subject area, and year.

Female Share = Female Faculty / Total Faculty with Known Gender × 100

Female share is reported at two levels. At the institution level, it covers all matched faculty at that institution for the selected filters. At the department level, it covers faculty within a specific department at that institution. Both levels can be further broken down by faculty rank (Assistant Professor, Associate Professor, Professor) and subject area (STEM, non-STEM, or specific fields). A faculty member is counted at an institution in a given year if their employment record covers that year at that institution. Someone who moved from one institution to another in 2020 appears at their previous institution in years before 2020 and at their new institution from 2020 onward.

Cells with fewer than 10 faculty are not displayed as default, to avoid reporting percentages based on very small counts. This applies across all institution, department, rank, and year combinations. This can be enabled or disabled in the dashboard settings.

Frequently Asked Questions

Is this a census of Indian academia?

No. The dashboard reflects VIDWAN registrations as of October 2025. The 89,000 faculty across 500+ institutions shown in the dashboard are those from the 224,166 scraped VIDWAN profiles who met our position criteria and could be matched to a known institution. This is itself a subset of the full population of faculty across Indian higher education, as VIDWAN registration is voluntary or institutionally driven and many institutions have little to no registered presence on VIDWAN.

Even within institutions that do appear, registered faculty represent only a portion of the actual faculty body. The female share shown for any institution is the share among VIDWAN-registered faculty at that institution, not the share among all faculty. These two figures can differ substantially depending on how registration was managed at that institution and who chose to register.

Is the source data publicly available?

All data shown on this dashboard originates from VIDWAN (vidwan.inflibnet.ac.in), and is publicly accessible with no login or registration required to view profiles. The data in the dashboard was scraped from this public interface in October 2025 and reflects the state of the database at that time.

What is the eligibility criteria to be included in VIDWAN?

VIDWAN's stated eligibility requires a postgraduate or doctoral degree in the relevant subject, with experience at the level of Associate Professor, Professor, Senior Scientist, or equivalent, or recognition through national or international honours and awards. In practice, registration is often managed institutionally through nodal officers, and the composition of who is registered varies by institution.

Before a profile is made publicly searchable, VIDWAN's Expert Database Team reviews the submission to verify that all mandatory fields contain valid data and that the applicant meets the eligibility criteria. Profiles are activated and become searchable only after this review is completed.

How is gender determined?

Gender is drawn from the self-reported gender field on each VIDWAN profile. We normalise the values to Male and Female. Profiles with missing, blank, or unrecognised gender values are excluded from female share calculations but retained in total headcounts. The denominator and numerator of the female share metric are therefore not always drawn from the same population.

How often is the source VIDWAN data updated?

VIDWAN's policy is to request annual confirmation from registered experts and to allow profile updates by the individual, their institutional nodal officer, or the VIDWAN team. In practice, many profiles are not kept current. The dashboard reflects a snapshot of VIDWAN as of October 2025 and will be updated when a new scrape is conducted. The data version in use is noted throughout the dashboard and documentation.

How accurate is the cleaning and classification?

Institution name matching, position classification, field taxonomy, and department standardisation all involved a combination of automated matching and manual review. These work well in aggregate but will produce individual errors, particularly for non-standard position titles, smaller or less prominent institutions, and research areas at disciplinary boundaries. Records whose institution name could not be matched to our reference list are excluded from all dashboard aggregates. The cleaning process is documented in detail in the Cleaning and Classification section above.

Why does my institution's figure look different from what I'd expect?

The most common reason is uneven VIDWAN registration coverage. If an institution's faculty have not been systematically registered on VIDWAN, the registered population may not be representative of the actual faculty body. A department or cohort that registered early, or was nominated by a nodal officer, may skew the figures in either direction.

Other factors include stale profiles of retired or departed faculty still appearing as active, position titles that could not be matched to a teaching category and were therefore excluded, and institution name variants that affected how many records were successfully matched. The figure shown is always among VIDWAN-registered faculty who met our matching and position criteria, not all faculty at the institution.

How do figures vary across years and how reliable is the trend data?

Each year's figure is computed from employment records that cover that year. As the year changes, the set of faculty counted changes: faculty who joined after a given year are not counted in earlier years, and faculty who left before a given year are not counted in later ones.

A known issue is that many VIDWAN profiles have not been updated since registration. Faculty who retired or moved institutions years ago may still have employment records running to 2025, which means they continue to appear in recent years' counts. Around 14% of individuals in the data have a single employment spell running continuously for 20 or more years to 2025. This can cause figures to shift in ways that do not reflect actual changes in faculty composition.

Coverage before 2015 is thin and uneven across institutions, and the panel is excluded entirely before 2010. Changes in female share in the earlier years of the panel may reflect changes in who was registered on VIDWAN as much as actual changes in hiring. Pre-2015 figures are included in the dashboard but should be treated with caution for trend analysis.

How is individual faculty data handled?

The dashboard does not publish or redistribute individual profile data. The dashboard displays only institution and department-level aggregates. The only individual-level feature is a name search covering faculty included in the dashboard, which shows a faculty member's name, current department and institution as listed in VIDWAN, and a link back to their original VIDWAN profile. No information is shown that is not already publicly available on VIDWAN itself.

Can I request to be removed or my data to be edited in the dashboard?

We do not edit, correct, or remove individual records from the dashboard. The data reflects the state of VIDWAN at the time of the October 2025 scrape. Any updates made to a VIDWAN profile will be reflected in the next scheduled data update of the dashboard. If you have concerns about the information on your VIDWAN profile, the appropriate point of contact is VIDWAN directly at vidwan@inflibnet.ac.in. For questions specific to CheckTheGap, contact us at iitdcoimpact@gmail.com.

Can an institution request to be removed or edited in the dashboard?

We do not permit removal or correction of institution-level data. The dashboard reflects aggregated statistics derived from VIDWAN registrations at the time of the October 2025 scrape, and all figures are subject to the coverage and data quality limitations of the source database, which are documented on this page.

Institutions that wish to improve their representation in future versions of the dashboard should ensure their faculty are registered and their profiles are current on VIDWAN. Updated data will be reflected in the next scheduled data update of the dashboard.

Citing

CheckTheGap is a project of STEM the Gap, a research initiative at IIT Delhi focused on gender equity in Indian STEM.

Citation

STEMtheGap. (2025). CheckTheGap: Faculty Gender Representation in Indian Higher Education. Data sourced from VIDWAN (INFLIBNET, Ministry of Education, Government of India). Retrieved from check.stemthegapindia.com.

Please note the data collection date (October 2025) when citing time-sensitive statistics.