The EMAN2 Processor¶
The EMAN2 Processor, found in processors/eman2_processor.py, is implemented to automatically set up cryo-ET processing of generated simulated data. It works by taking a pre-written Python template script, which lists the EMAN2 programs needed to run to take a series of raw tilt stacks to a sub-tomogram average, and filling in the desired parameters to those programs based on configurations provided in the YAML input file. This template script can be found at templates/eman2/eman2_process.py. Any argument which can be given to the EMAN2 programs called can be given in the YAML file as part of the parameters object for the program. Note the example configurations below.
processors: [
{name: "eman2",
args:
{
particle_coordinates_parameters: {
"mode": "single",
"coordinates_file": "/Users/kshin/Documents/repositories/ETSimulations/templates/eman2/T4SS_coords_3by3.txt",
"unbinned_boxsize": 128
},
steps_to_run: ["import", "reconstruct"],
e2import_parameters : {
"import_tiltseries": "enable",
"importation": "copy",
"apix": 2.83,
"boxsize": 64
},
e2tomogram_parameters : {
"tltstep": 3,
"tltax": -90,
"npk": 10,
"tltkeep": 0.9,
"outsize": "1k",
"niter": "2,1,1,1",
"pkkeep": 0.9,
"bxsz": 64,
"pk_mindist": 0.125,
"filterto": 0.45,
"rmbeadthr": 10.0,
"threads": 48,
"clipz": 350,
"notmp": "enable"
}
}]
Like the example above, you can include EMAN2 processing in your processing run by adding a processor argument to the “processors” list with the name “eman2” and suitable args.
Parameters¶
steps_to_run¶
In the args field, the steps_to_run lists the processing steps which will be run when you eventually run the script generated by the processor (Note that the functions will still exist in the script to run steps not included in the list, they just won’t be executed unless you manually enable them back). The available steps taken by the steps_to_run are “import”, “reconstruct”, “estimate_ctf”, “extract”, “build_set”, “generate_initial_model”, and “3d_refinement”. Note also that the order matters in this list (i.e. reconstructing before importing tiltseries will result errors).
particle_coordinates_parameters¶
The particle_coordinates_parameters object lists parameters to be used for particle “picking” in preparation for sub-tomogram averaging. Specifically, they tell the EMAN2 Processor what particles to record in the EMAN2 info files for each tomogram as if they were picked using the EMAN2 Boxer tool. These parameters are:
- modestring
The options available for the particle picking mode can be “single”, “multiple”, or “sim”. This refers to the files used to indicate the particle coordinates to transfer into the EMAN2 project. In “single” mode, a single filepath should be provided for the coordinates_file argument, pointing to a text file with the 3D particle coordinates to record (an example is provided in templates/eman2). These coordinates will be used for every tomogram in the dataset. In “multiple” mode, each tomogram can be given its own coordinates file, allowing things like artificially induced picking errors. The coordinates_file should just be a file name instead of a full path. The EMAN2 Processor will then go into each sub-directory in the dataset’s raw_data folder and look for this filename to use as the coordinates for the tomogram reconstructed from that particular stack. Finally, using the “sim” mode will tell the EMAN2 Processor to retrieve particle orientations from the sim_metadata.json file which records the known orientations from the original ets_generate_data.py run.
- coordinates_filestring
The text file containing 3D particle coordinates to be transferred to a tomogram’s info JSON file. The coordinates in this file should be in pixels, based on the EMAN2 conventions (with the origin being in the center of the axis instead of the lower left, for example). An example such file is provided at templates/eman2/T4SS_coords_3by3.txt. Note: read the mode explanation above as the specific usage of this parameter changes based on the mode.
- unbinned_boxsizeinteger
This defines the boxsize to record for particles when importing the particles from the particles coordinates file into the EMAN2 tomogram info files. This should be in terms of unbinned pixels, i.e. the scale matching the original tiltseries imported.
e2*_parameters objects¶
Each EMAN2 program is given its own *_parameters section in the “args” field, listing all the command line arguments that would be passed in if calling these programs normally. For example, arguments to e2import.py would be listed in the e2import_parameters field as shown. Arguments which are just flags instead of taking a value, such as the “–help” option available in all these programs, should be put in to the configuration section with the special value of “enable” as can be seen in the example above. The full list of EMAN2 programs exposed by the EMAN2 Processor, and thus able to take their own *_parameters section is:
e2import.py : e2import_parameters
e2tomogram.py : e2tomogram_parameters
e2spt_tomoctf.py : e2spt_tomoctf_parameters
e2spt_extract.py : e2spt_extract_parameters
e2spt_buildsets.py : e2spt_buildsets_parameters
e2spt_sgd.py : e2spt_sgd_parameters
e2spt_refine.py : e2spt_refine_parameters
Note: It is recommended that you consider enabling the “noali” option in the e2tomogram_parameters when processing simulated data. The overall lack of large distinct features across the tiltseries (like entire cells in real tomograms) can confuse the coarse alignment step in e2tomogram.py and cause undesired large alignment errors computed during the reconstruction step.
Note 2: Also when choosing options for e2tomogram.py, note that the processor will run the e2tomogram.py command once for each tiltseries, which means you should avoid enabling the “alltiltseries” option. Doing so will end up reconstructing each tomogram over and over, once per the number of tiltserieses in the directory. This is not an issue, of course, if you plan on simply taking the generated eman2_process_commands.txt file and running the raw command-line EMAN2 commands manually instead of using the generated eman2_process.py script.
Running the generated script¶
The generated EMAN2 processing script that is outputted by ets_process_data.py will be located in the newly created EMAN2 project directory in the processed_data folder created. This will be a normal Python script you can run, albeit requiring Python 3, using:
python3 eman2_process.py
Note that Python 3 is only used for the proper file IO and kicking off EMAN2 programs. The EMAN2 programs themselves will be run using Python 2 as EMAN2 is still using Python 2 officially.
Another important thing to note is that the created eman2_process.py script is not meant to be a rigid program. It has been designed to be easily modifiable - all parameters originally passed in are located towards the top of the file. Thus, the script can be easily opened and edited as necessary, such as if the steps_to_process should be modified to pick up from where an error interrupted the processing. For additional clarity and potential modification, a simple text file containing the raw command-line versions of the EMAN2 commands handled by the eman2_process.py will be created as well in a file called eman2_process_commands.txt.
Though this may not be an universal issue, we have also been observed that running e2spt_extract.py through the terminal (as we do here) will sometimes result in not all particles boxed for a tomogram actually being extracted for some reason. Instead we will see the program complete at, say, 7/9 particles finished before continuing on to the next tomogram. It may be a good idea to check your outputs for the extraction step and perhaps run that part through the e2projectmanager.py GUI before continuing on with your processing.