[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Michael Blasnik" <michael.blasnik@verizon.net> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: RE: Data manipulation query |

Date |
Wed, 11 Jan 2006 19:36:10 -0500 |

I believe Nick's solution relies on there being only 2 possible scores per subject, but the second part of your query indicates that there may be multiple scores for a given year. Here's another approach:

starting with the 2nd question:

* first, drop missing scores if more than 1 score

bysort ID subject grade (score): drop if score==. & score[1]<.

* then drop lower scores when there's more than 1

bysort ID subject grade (score): drop if _n<_N

Now, assuming that grade only takes on the values of 4 and 6 and you've eliminated duplicates within ID subject grade (using the code above), the following will flag cases where there are grades for both cases within each ID and subject, and then you can drop the observations that don't meet that criterion:

bysort ID subject (grade): gen byte hasboth=score[1]<. & score[2]<.

drop if hasboth==0

Michael Blasnik

michael.blasnik@verizon.net

----- Original Message ----- From: "Nick Cox" <n.j.cox@durham.ac.uk>

To: <statalist@hsphsun2.harvard.edu>

Sent: Wednesday, January 11, 2006 6:58 PM

Subject: st: RE: Data manipulation query

(a) Suppose you focus on English. You can go egen nEnglish = sum(strpos(subject, "English")), by(ID) keep if nEnglish == 2 drop nEnglish <snip> (b) Given (a), bysort ID (Math) : gen max = score[2] etc. Nick n.j.cox@durham.ac.uk Anirudh V. S. Ruhil, Ph.D. wroteI have two data manipulation questions: (a) Lets says I have a dataset of students' scores for 2 subjects when tested in the 4th grade, and their scores on similar tests taken in the 6th grade. Some students are lost between the 4th and 6th grade but some show up in both grades. The data structure is as follows ... ID grade subject score A1 04 English 271 A1 04 Mathematics 190 A1 06 English 260 A1 06 Mathematics 214 A2 04 English . A2 04 Mathematics 165 A2 06 English 187 A2 06 Mathematics 193 A3 04 English . A3 04 Mathematics . A3 06 English 216 A3 06 Mathematics 265 How would I create a subset of these data such that the subset only contains records for students with non-missing scores in a given subject on BOTH the 4th and the 6th grade tests? (b) In the same dataset, let us assume there are multiple records for some students on a single grade and subject. For example, A4 04 English 191 A4 04 English 219 Of these multiple records, how can I select entries with the HIGHER SCORE? (i.e., the lower scores have to be discarded). I'm sure there is a quickfire way to solve both so I'd be grateful if someone points me in the right direction in STATAspeak (9.1).

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: Data manipulation query***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: Data manipulation query** - Next by Date:
**st: RE: Re: RE: Data manipulation query** - Previous by thread:
**st: RE: Data manipulation query** - Next by thread:
**Re: st: RE: Data manipulation query** - Index(es):

© Copyright 1996–2021 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |