Highlights


Featured in Top Paper Picks in Sebastian Ruder's NLP Newsletter

Featured by Diverse Artificial Intelligence Research Initiative

Featured and Interviewed by UNC Chapel Hill Admissions

Featured on Front Page of Daily Tar Heel in Women in Science Article

Featured as Exemplary Female Researcher in Endeavors Magazine

Google AI Research Mentorship Program

Google

Sep 2018 - present

Emerging Technologies Intern

The Walt Disney Company

Aug 2018 - present

Deep Learning Intern

MITRE Corp.

May 2018 - Aug 2018

Deep Learning and NLP Research Assistant

UNC Chapel Hill, Computer Science

Aug 2017 – present

Web Development Intern - NSF Research Assistant

UNC Chapel Hill, Education

Aug 2016 – May 2017

Computer Science Intern

Geospatial Research Laboratory, Army Corps of Engineer

July - Aug 2015

Girls Who Code Internship for Experienced Programmers

VMWare

June – July 2015

Teaching Assistant for Foundations of Computer Science

Fairfax County Public Schools

July – Aug 2014

Community Service and Outreach Committee Chair

Chancellor's Science Scholars, UNC Chapel Hill

Jan 2018 - present

Local Refugee Mentor and Volunteer

Outreach360, UNC Chapel Hill

Aug 2016 - present

Founding President of Club Luminous

Feb 2014 - present

Maker-in-Residence Executive Committee Member

MakerSpace, UNC Chapel Hill

Aug 2016 – Aug 2017

HackCL Director – Club Luminous Hackathon

Club Luminous

April 2016

Moogfest Young Engineers Scholarship – Sponsored by Fidelity and Girls Who Code May 2018
Grace Hopper 2018 UNC Chapel Hill Scholarship Apr 2018
1st Place Math and Computer Science Poster - National Sigma Xi Research Conference Nov 2017
STEM Diversity Scholarship – Full Scholarship (Tuition/Room/Board), Academic Merit Jun 2016 - present
Chancellor’s Science Scholars – 10k/yr Scholarship, Academic Merit Jun 2016 - present
Rewriting the Code, Women in Computer Science Fellow Sep 2017 - present
Dean’s List for Five Consecutive Semesters Jun 2016 - May 2018
National Merit Scholarship Finalist Apr 2016
National AP Scholar Award May 2016
Best Design for Mobile Application – HackTJ Hackathon May 2015

SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories

Sweta Karlekar, Mohit Bansal

Poster Presented at EMNLP 2018

With the recent rise of #MeToo, an increasing number of personal stories about sexual harassment and sexual abuse have been shared online. In order to push forward the fight against such harassment and abuse, we present the task of automatically categorizing and analyzing various forms of sexual harassment, based on stories shared on the online forum SafeCity. For the labels of groping, ogling, and commenting, our single-label CNN-RNN model achieves an accuracy of 86.5%, and our multi-label model achieves a Hamming score of 82.5%. Furthermore, we present analysis using LIME, first-derivative saliency heatmaps, activation clustering, and embedding visualization to interpret neural model predictions and demonstrate how this helps extract features that can help automatically fill out incident reports, identify unsafe areas, avoid unsafe practices, and ‘pin the creeps’.

Paper (EMNLP 2018)
Dataset and Data Splits

Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural Models

Sweta Karlekar, Tong Niu, Mohit Bansal

Poster Presented at NAACL 2018

Alzheimer's disease (AD) is an irreversible and progressive brain disease that can be stopped or slowed down with medical treatment. Language changes serve as a sign that a patient's cognitive functions have been impacted, potentially leading to early diagnosis. In this work, we use NLP techniques to classify and analyze the linguistic characteristics of AD patients using the DementiaBank dataset. We apply three neural models based on CNNs, LSTM-RNNs, and their combination, to distinguish between language samples from AD and control patients. We achieve a new independent benchmark accuracy for the AD classification task. More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques. We then perform novel automatic pattern discovery inside activation clusters, and consolidate AD patients' distinctive grammar patterns. Additionally, we show that first derivative saliency can not only rediscover previous language patterns of AD patients, but also shed light on the limitations of neural models. Lastly, we also include analysis of gender-separated AD data.

Paper (NAACL 2018)
Poster (NAACL 2018)
Presentation (UNC Undergraduate Research Symposium)

#MeToo: Neural Detection and Explanation of Language in Personal Abuse Stories

Sweta Karlekar, Mohit Bansal

Poster Presented at NAACL WiNLP 2018

The detection and classification of domestic abuse stories shared online has ever-increasing importance in today's social activism sphere. With massive numbers of stories shared, automatic detection can aggregate stories from around the internet and help push forward the fight against domestic abuse from a social campaign to social change. We develop CNN, LSTM-RNN, and CNN-LSTM neural models to detect domestic abuse stories in the Reddit Domestic Abuse dataset. We achieved 95.8% accuracy in classifying posts as containing abuse stories versus not containing abuse stories, outperforming the current state-of-the-art. More importantly, we next present sentiment-only classification feasibility as well as interpretable and explainable analysis of the neural model's predictions using activation clustering techniques to automatically discover linguistic features.

Paper (NAACL WiNLP 2018)
Poster (NAACL WiNLP 2018)

Developing a Method to Mask Trees in Commercial Multispectral Imagery

Becker, S. J.; Daughtry, C. S. T.; Jain, D.; Karlekar, S. S.

American Geophysical Union, Fall Meeting 2015

The US Army has an increasing focus on using automated remote sensing techniques with commercial multispectral imagery (MSI) to map urban and peri-urban agricultural and vegetative features; however, similar spectral profiles between trees (i.e., forest canopy) and other vegetation result in confusion between these cover classes. Established vegetation indices, like the Normalized Difference Vegetation Index (NDVI), are typically not effective in reliably differentiating between trees and other vegetation. Previous research in tree mapping has included integration of hyperspectral imagery (HSI) and LiDAR for tree detection and species identification, as well as the use of MSI to distinguish tree crowns from non-vegetated features. This project developed a straightforward method to model and also mask out trees from eight-band WorldView-2 (1.85 meter x 1.85 meter resolution at nadir) satellite imagery at the Beltsville Agricultural Research Center in Beltsville, MD spanning 2012 - 2015. The study site included tree cover, a range of agricultural and vegetative cover types, and urban features. The modeling method exploits the product of the red and red edge bands and defines accurate thresholds between trees and other land covers. Results show this method outperforms established vegetation indices including the NDVI, Soil Adjusted Vegetation Index, Normalized Difference Water Index, Simple Ratio, and Normalized Difference Red Edge Index in correctly masking trees while preserving the other information in the imagery. This method is useful when HSI and LiDAR collection are not possible or when using archived MSI.

Abstract

Questions? Collaborations?
Contact me at swetakar@cs.unc.edu.