sCooL: A System for Academic Institution Name Normalization
Date:
Named Entity Normalization (NEN) refers to the process of linking recognized entities to concrete, unambiguous real-world references. In the context of the online job posting domain, accurate normalization of academic institution names offers significant value for performing advanced labor market analysis.
This work presents the design and implementation of sCooL, an automated academic institution name normalization system developed to replace the manual mapping process previously used at CareerBuilder (CB). The system addresses several domain-specific challenges and leverages Wikipedia as a primary knowledge source to create institution mappings from a database of school names extracted from job applicant résumés.
The generated mappings form a comprehensive database used for entity normalization, providing flexibility to incorporate both curated and non-curated data sources. sCooL also includes mechanisms to detect malformed entries and distinguish K–12 schools from higher education institutions.
Through extensive comparative evaluation, we demonstrate that sCooL achieves superior coverage and improved accuracy compared to the existing manual mapping approach, offering a scalable and reliable solution for academic entity normalization.
