The catalogue contains study descriptions in various languages. The system searches with your search terms from study descriptions available in the language you have selected. The catalogue does not have ‘All languages’ option as due to linguistic differences this would give incomplete results. See the User Guide for more detailed information.
Rerunning Patterns of Members of Legislative Assembly in India's State Elections, 1985-2018
Creator
Shrimankar, D, Royal Holloway, University of London
Study number / PID
854706 (UKDA)
10.5255/UKDA-SN-854706 (DOI)
Data access
Open
Series
Not available
Abstract
The dataset covers regional assembly or Vidhan Sabha elections in India from 1985-2018. We first divided the data into two sets, pre-delimitation (1985-2007) and post-delimitation (2008-2018). We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. We manually went over a large portion of the data, and corrected for any measurement errors. There were 91260 candidates with 17961 constituency-years in 4424 Vidhan Sabha constituencies for the pre-delimitation period after excluding independents. We further subset the dataset to top four candidates because candidates further down the list rarely received many votes. We were left with 58842 candidates with 17959 constituency-years in 4424 Vidhan Sabha constituencies. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same seat. We were further left with 20362 candidates with 12310 constituency-years in 4098 Vidhan Sabha constituencies. Out of the 20362 candidates, we have 10920 incumbent party candidates and 9442 challenger party candidates. Therefore, for the pre-delimitation time period, we have rerunning data for 10920 incumbent party candidates and 9442 challenger party candidates. For the post-delimitation period, there were 27723 candidates with 9096 constituency-years in 4067 Vidhan Sabha constituencies after removing independents and subsetting the dataset to top four candidates. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same...
Terminology used is generally based on DDI controlled vocabularies: Time Method, Analysis Unit, Sampling Procedure and Mode of Collection, available at CESSDA Vocabulary Service.
Methodology
Data collection period
01/10/2019 - 31/05/2020
Country
India
Time dimension
Not available
Analysis unit
Individual
Other
Universe
Not available
Sampling procedure
Not available
Kind of data
Numeric
Data collection mode
We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. The easiest to code were those that have a distance score of zero for all the distance measures. There were 4643 cases where all the distance measures were zero. Furthermore, there were an additional 4036 cases where the names were only marginally different across election years. For example, J.C. Divakar Reddy of INC in Tadpatri constituency for the 1989 Vidhan sabha elections in Andhra Pradesh was recorded as J.C. Diwakara Reddy in the same constituency for the 1994 Vidhan Sabha elections. A Levenshtein distance of less than or equal to 2 allows us to account for them. Moreover, considering the way the Election Commission of India recorded names across election years, we used a combination of distance measures to name-match candidates. For example in Birapur constituency of Uttar Pradesh Prof. Shivakant Ojha reran for elections in 2007 for the Bharatiya Janata Party (BJP), but was entered as Shiva Kant Ojha without the prefix and with spaces. A simple Levenshtein distance score would not be able to capture this. Instead, we used a combination of Levenshtein and cosine distance to capture such cases. Another common example was the lack of order in terms of naming candidates across elections. For example, in 2004 Kagal constituency of Maharashtra, Mushrif Hasan Miyalal reran for elections for the Nationalist Congress Party (NCP), but was entered as Hasan Miyalal Mushrif in 1999. Once again, a simple Levenshtein distance score would not be able to help us accurately record this. Instead we used a combination of Levenshtein and q-gram distance measures to accurately code them. Then there were some cases that showed a very high Levenshtein distance but were captured because of a distance of zero on either cosine, Jaccard or q-gram distance measures. For example, Aqbal Hasan Alias Aqbal Husain of Gainsari constituency of Uttar Pradesh was coded as Aqbal Husain for the 1991 Vidhan Sabha elections. The candidate has a Levenshtein distance of 18 but a Jaccard distance score of 0. When it came to coding candidates which did not rerun for elections in the same seat for the same party, a Levenshtein distance of greater than 15 in combination with high scores on other distance measures was applied as a cut-off. Of course, there were many cases of candidates not rerunning for the same party in the same seat across elections, but had a Levenshtein distance of less than 15. These cases were captured by using a combination of Levenshtein distance with other distance measures. For example, the Telugu Desam Party (TDP) ran Godam Rama Rao in Boath constituency of Andhra Pradesh in 1989 but replaced him with Godem Nagesh for the 1994 Vidhan Sabha election. The two strings have Levenshtein distance of 9, but a cosine distance of 0.408, a Jaccard distance of 0.363, a q-gram distance of 13 and a Jaro-Winker distance of 0.450. In most cases, a combination of Levenshtein distance of greater than 5 with either a high Jaro-Winker distance and/or cosine distance helped us capture cases where candidates have not rerun for elections for the same party in the same seat.
Funding information
Grant number
ES/T007451/1
Access
Publisher
UK Data Service
Publication year
2021
Terms of data access
The UK Data Archive has granted a dissemination embargo. The embargo will end on 01 March 2022 and the data will then be available in accordance with the access level selected.