Summary information

Study title

Rerunning Patterns of Members of Legislative Assembly in India's State Elections, 1985-2018

Creator

Shrimankar, D, Royal Holloway, University of London

Study number / PID

854706 (UKDA)

10.5255/UKDA-SN-854706 (DOI)

Data access

Open

Series

Not available

Abstract

The dataset covers regional assembly or Vidhan Sabha elections in India from 1985-2018. We first divided the data into two sets, pre-delimitation (1985-2007) and post-delimitation (2008-2018). We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. We manually went over a large portion of the data, and corrected for any measurement errors. There were 91260 candidates with 17961 constituency-years in 4424 Vidhan Sabha constituencies for the pre-delimitation period after excluding independents. We further subset the dataset to top four candidates because candidates further down the list rarely received many votes. We were left with 58842 candidates with 17959 constituency-years in 4424 Vidhan Sabha constituencies. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same seat. We were further left with 20362 candidates with 12310 constituency-years in 4098 Vidhan Sabha constituencies. Out of the 20362 candidates, we have 10920 incumbent party candidates and 9442 challenger party candidates. Therefore, for the pre-delimitation time period, we have rerunning data for 10920 incumbent party candidates and 9442 challenger party candidates. For the post-delimitation period, there were 27723 candidates with 9096 constituency-years in 4067 Vidhan Sabha constituencies after removing independents and subsetting the dataset to top four candidates. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same...
Read more

Topics

Methodology

Data collection period

01/10/2019 - 31/05/2020

Country

India

Time dimension

Not available

Analysis unit

Individual
Other

Universe

Not available

Sampling procedure

Not available

Kind of data

Numeric

Data collection mode

We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. The easiest to code were those that have a distance score of zero for all the distance measures. There were 4643 cases where all the distance measures were zero. Furthermore, there were an additional 4036 cases where the names were only marginally different across election years. For example, J.C. Divakar Reddy of INC in Tadpatri constituency for the 1989 Vidhan sabha elections in Andhra Pradesh was recorded as J.C. Diwakara Reddy in the same constituency for the 1994 Vidhan Sabha elections. A Levenshtein distance of less than or equal to 2 allows us to account for them. Moreover, considering the way the Election Commission of India recorded names across election years, we used a combination of distance measures to name-match candidates. For example in Birapur constituency of Uttar Pradesh Prof. Shivakant Ojha reran for elections in 2007 for the Bharatiya Janata Party (BJP), but was entered as Shiva Kant Ojha without the prefix and with spaces. A simple Levenshtein distance score would not be able to capture this. Instead, we used a combination of Levenshtein and cosine distance to capture such cases. Another common example was the lack of order in terms of naming candidates across elections. For example, in 2004 Kagal constituency of Maharashtra, Mushrif Hasan Miyalal reran for elections for the Nationalist Congress Party (NCP), but was entered as Hasan Miyalal Mushrif in 1999. Once again, a simple Levenshtein distance score would not be able to help us accurately record this. Instead we used a combination of Levenshtein and q-gram distance measures to accurately code them. Then there were some cases that showed a very high Levenshtein distance but were captured because of a distance of zero on either cosine, Jaccard or q-gram distance measures. For example, Aqbal Hasan Alias Aqbal Husain of Gainsari constituency of Uttar Pradesh was coded as Aqbal Husain for the 1991 Vidhan Sabha elections. The candidate has a Levenshtein distance of 18 but a Jaccard distance score of 0. When it came to coding candidates which did not rerun for elections in the same seat for the same party, a Levenshtein distance of greater than 15 in combination with high scores on other distance measures was applied as a cut-off. Of course, there were many cases of candidates not rerunning for the same party in the same seat across elections, but had a Levenshtein distance of less than 15. These cases were captured by using a combination of Levenshtein distance with other distance measures. For example, the Telugu Desam Party (TDP) ran Godam Rama Rao in Boath constituency of Andhra Pradesh in 1989 but replaced him with Godem Nagesh for the 1994 Vidhan Sabha election. The two strings have Levenshtein distance of 9, but a cosine distance of 0.408, a Jaccard distance of 0.363, a q-gram distance of 13 and a Jaro-Winker distance of 0.450. In most cases, a combination of Levenshtein distance of greater than 5 with either a high Jaro-Winker distance and/or cosine distance helped us capture cases where candidates have not rerun for elections for the same party in the same seat.

Funding information

Grant number

ES/T007451/1

Access

Publisher

UK Data Service

Publication year

2021

Terms of data access

The UK Data Archive has granted a dissemination embargo. The embargo will end on 01 March 2022 and the data will then be available in accordance with the access level selected.

Related publications

Not available