Linux filter lines from file

How to use grep to filter out lines starting with any of a set of keywords?

I have a large file (a chemical database), and I need to display only header records, which are defined as lines that don’t start with: ATOM , CONNECT , HETATM , TER , or END . I’m supposed to use grep to do this. Here’s a sample of the file (the entire file is here):

HEADER TRANSFERASE 15-OCT-12 4HKD TITLE CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN COMPND MOL_ID: 1; COMPND 2 MOLECULE: SERINE/THREONINE-PROTEIN KINASE 3; COMPND 3 CHAIN: A, B, C, D; COMPND 4 FRAGMENT: SARAH DOMAIN, UNP RESIDUES 436-484; COMPND 5 SYNONYM: MAMMALIAN STE20-LIKE PROTEIN KINASE 2, MST-2, STE20-LIKE COMPND 6 KINASE MST2, SERINE/THREONINE-PROTEIN KINASE KRS-1, SERINE/THREONINE- COMPND 7 PROTEIN KINASE 3 36KDA SUBUNIT, MST2/N, SERINE/THREONINE-PROTEIN COMPND 8 KINASE 3 20KDA SUBUNIT, MST2/C; COMPND 9 EC: 2.7.11.1; COMPND 10 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 ORGANISM_TAXID: 9606; SOURCE 5 GENE: STK3, KRS1, MST2; SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 7 EXPRESSION_SYSTEM_TAXID: 562; SOURCE 8 EXPRESSION_SYSTEM_STRAIN: BL21 (DE3) CODON PLUS; SOURCE 9 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID; SOURCE 10 EXPRESSION_SYSTEM_PLASMID: HT-PET28A KEYWDS HOMODIMERIZATION, HETERODOMERIZATION, SAV1, NEK2, RASSF, TRANSFERASE EXPDTA X-RAY DIFFRACTION AUTHOR G.G.LIU,Z.B.SHI,Z.C.ZHOU REVDAT 1 04-SEP-13 4HKD 0 JRNL AUTH G.G.LIU,Z.B.SHI,Z.C.ZHOU JRNL TITL CRYSTAL STRUCTURE OF HUMAN MST2 SARAH DOMAIN JRNL REF TO BE PUBLISHED JRNL REFN REMARK 2 REMARK 2 RESOLUTION. 1.50 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : PHENIX (PHENIX.REFINE: 1.8_1069) REMARK 3 AUTHORS : PAUL ADAMS,PAVEL AFONINE,VICENT CHEN,IAN REMARK 3 : DAVIS,KRESHNA GOPAL,RALF GROSSE- REMARK 3 : KUNSTLEVE,LI-WEI HUNG,ROBERT IMMORMINO, REMARK 3 : TOM IOERGER,AIRLIE MCCOY,ERIK MCKEE,NIGEL REMARK 3 : MORIARTY,REETAL PAI,RANDY READ,JANE REMARK 3 : RICHARDSON,DAVID RICHARDSON,TOD ROMO,JIM REMARK 3 : SACCHETTINI,NICHOLAS SAUTER,JACOB SMITH, REMARK 3 : LAURENT STORONI,TOM TERWILLIGER,PETER REMARK 3 : ZWART REMARK 3 REMARK 3 REFINEMENT TARGET : ML REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.50 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 34.86 REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.380 REMARK 3 COMPLETENESS FOR RANGE (%) : 91.9 REMARK 3 NUMBER OF REFLECTIONS : 29481 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 R VALUE (WORKING + TEST SET) : 0.197 REMARK 3 R VALUE (WORKING SET) : 0.195 REMARK 3 FREE R VALUE : 0.231 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.080 REMARK 3 FREE R VALUE TEST SET COUNT : 1497 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT (IN BINS). REMARK 3 BIN RESOLUTION RANGE COMPL. NWORK NFREE RWORK RFREE REMARK 3 1 34.8685 - 3.3427 0.97 2878 149 0.1998 0.2322 REMARK 3 2 3.3427 - 2.6535 0.98 2711 175 0.2033 0.2452 REMARK 3 3 2.6535 - 2.3182 0.96 2660 155 0.1968 0.2148 REMARK 3 4 2.3182 - 2.1063 0.94 2620 114 0.1875 0.2318 REMARK 3 5 2.1063 - 1.9553 0.91 2533 113 0.1909 0.2295 REMARK 3 6 1.9553 - 1.8400 0.91 2476 143 0.1883 0.2137 REMARK 3 7 1.8400 - 1.7479 0.90 2465 128 0.1840 0.2029 REMARK 3 8 1.7479 - 1.6718 0.90 2446 130 0.1783 0.2144 REMARK 3 9 1.6718 - 1.6074 0.90 2419 129 0.1864 0.2400 REMARK 3 10 1.6074 - 1.5520 0.90 2487 120 0.1938 0.2588 REMARK 3 11 1.5520 - 1.5030 0.85 2289 141 0.1993 0.2471 REMARK 3 REMARK 3 BULK SOLVENT MODELLING. REMARK 3 METHOD USED : FLAT BULK SOLVENT MODEL REMARK 3 SOLVENT RADIUS : 1.11 REMARK 3 SHRINKAGE RADIUS : 0.90 REMARK 3 K_SOL : NULL REMARK 3 B_SOL : NULL REMARK 3 REMARK 3 ERROR ESTIMATES. REMARK 3 COORDINATE ERROR (MAXIMUM-LIKELIHOOD BASED) : 0.130 REMARK 3 PHASE ERROR (DEGREES, MAXIMUM-LIKELIHOOD BASED) : 21.520 REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : NULL REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : NULL REMARK 3 B22 (A**2) : NULL REMARK 3 B33 (A**2) : NULL REMARK 3 B12 (A**2) : NULL REMARK 3 B13 (A**2) : NULL REMARK 3 B23 (A**2) : NULL REMARK 3 REMARK 3 TWINNING INFORMATION. REMARK 3 FRACTION: NULL REMARK 3 OPERATOR: NULL REMARK 3 REMARK 3 DEVIATIONS FROM IDEAL VALUES. REMARK 3 RMSD COUNT REMARK 3 BOND : 0.007 1771 REMARK 3 ANGLE : 1.179 2367 REMARK 3 CHIRALITY : 0.083 255 REMARK 3 PLANARITY : 0.006 317 REMARK 3 DIHEDRAL : 14.379 737 REMARK 3 REMARK 3 TLS DETAILS REMARK 3 NUMBER OF TLS GROUPS : NULL REMARK 3 REMARK 3 NCS DETAILS REMARK 3 NUMBER OF NCS GROUPS : NULL REMARK 3 REMARK 3 OTHER REFINEMENT REMARKS: NULL REMARK 4 REMARK 4 4HKD COMPLIES WITH FORMAT V. 3.30, 13-JUL-11 REMARK 100 REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 22-OCT-12. REMARK 100 THE RCSB ID CODE IS RCSB075574. REMARK 200 REMARK 200 EXPERIMENTAL DETAILS REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION REMARK 200 DATE OF DATA COLLECTION : 16-APR-12 REMARK 200 TEMPERATURE (KELVIN) : 100 REMARK 200 PH : 4.6 REMARK 200 NUMBER OF CRYSTALS USED : 1 REMARK 200 REMARK 200 SYNCHROTRON (Y/N) : Y REMARK 200 RADIATION SOURCE : SSRF REMARK 200 BEAMLINE : BL17U REMARK 200 X-RAY GENERATOR MODEL : NULL REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M REMARK 200 WAVELENGTH OR RANGE (A) : 0.97915 REMARK 200 MONOCHROMATOR : SI 111 CHANNEL REMARK 200 OPTICS : NULL REMARK 200 REMARK 200 DETECTOR TYPE : CCD REMARK 200 DETECTOR MANUFACTURER : ADSC QUANTUM 315 REMARK 200 INTENSITY-INTEGRATION SOFTWARE : HKL-2000 REMARK 200 DATA SCALING SOFTWARE : HKL-2000 REMARK 200 REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 29548 REMARK 200 RESOLUTION RANGE HIGH (A) : 1.500 REMARK 200 RESOLUTION RANGE LOW (A) : 50.000 REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000 REMARK 200 REMARK 200 OVERALL. REMARK 200 COMPLETENESS FOR RANGE (%) : 92.3 REMARK 200 DATA REDUNDANCY : 5.300 REMARK 200 R MERGE (I) : NULL REMARK 200 R SYM (I) : NULL REMARK 200 FOR THE DATA SET : 17.1000 

Источник

Читайте также:  Установить линукс как вторую ос

How to filter out lines of a command output that occur in a text file?

Let’s say we have a text file of forbidden lines forbidden.txt . What is a short way to filter all lines of a command output that exist in the text file?

cat input.txt | exclude-forbidden-lines forbidden.txt | sort 

1 Answer 1

$ grep -v -x -F -f forbidden.txt input.txt 

That long list of options to grep means

  • -v Invert the sense of the match, i.e. look for lines not matching.
  • -x When matching a pattern, require that the pattern matches the whole line, i.e. not just anywhere on the line.
  • -F When matching a pattern, treat it as a fixed string, i.e. not as a regular expression.
  • -f Read patterns from the given file ( forbidden.txt ).

Then pipe that to sort or whatever you want to do with it.

You must log in to answer this question.

Linked

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.14.43533

Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group.
This site is not affiliated with Linus Torvalds or The Open Group in any way.

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

Оцените статью
Adblock
detector