<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to feature-requests</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/</link><description>Recent changes to feature-requests</description><atom:link href="https://sourceforge.net/p/wgs-assembler/feature-requests/feed.rss" rel="self"/><language>en</language><lastBuildDate>Wed, 06 Jul 2016 06:49:57 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/wgs-assembler/feature-requests/feed.rss" rel="self" type="application/rss+xml"/><item><title>Cannot determine type of file '.gkpStore:untrim'.  Tried</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/142/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;/usr/hpc-bio/celera-8.3-svn/Linux-amd64/bin/meryl -B -C -m 31 -threads 4 -memory 737280  -s /biowrk/celera/juglans.regian/PRJNA291087.DNA/juglans.gkpStore:untrim -o /biowrk/celera/juglans.regian/PRJNA291087.DNA/9-terminator/mercy/juglans-ms31-frgFull&lt;br/&gt;
seqFactory::registerFile()--  Cannot determine type of file '/biowrk/celera/juglans.regian/PRJNA291087.DNA/juglans.gkpStore:untrim'.  Tried:&lt;br/&gt;
seqFactory::registerFile()--         'FastA'&lt;br/&gt;
seqFactory::registerFile()--         'FastAstream'&lt;br/&gt;
seqFactory::registerFile()--         'Fastq'&lt;br/&gt;
seqFactory::registerFile()--         'FastQstream'&lt;br/&gt;
seqFactory::registerFile()--         'seqStore'&lt;br/&gt;
seqFactory::registerFile()--         'gkpStore'&lt;br/&gt;
seqFactory::registerFile()--         'gkpStoreChain'&lt;/p&gt;
&lt;p&gt;this error will not stop the runCA command .&lt;/p&gt;
&lt;p&gt;should we remove this command  or fix it? &lt;br/&gt;
I searched the source, there is only one place about "gkpStore:untrim",&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">wangyugui</dc:creator><pubDate>Wed, 06 Jul 2016 06:49:57 -0000</pubDate><guid>https://sourceforge.net2c1b34aadcc473730e9a83f7d80fd2ce06b82d70</guid></item><item><title>#141 sort with multiple thread support to improve performance</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/141/?limit=25#c865</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;the dirty fix seems work. please close this ticket.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">wangyugui</dc:creator><pubDate>Wed, 06 Jul 2016 06:46:25 -0000</pubDate><guid>https://sourceforge.netb48b7144281ea051fd2cc3c715b2b4fe51f27b53</guid></item><item><title>#141 sort with multiple thread support to improve performance</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/141/?limit=25#0e70</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;my friend give a dirty fix. i will test it.&lt;/p&gt;
&lt;p&gt;Index: bin/caqc.pl&lt;br/&gt;
--- bin/caqc.pl (revision 676)&lt;br/&gt;
+++ bin/caqc.pl (working copy)&lt;br/&gt;
@@ -1751,8 +1751,9 @@&lt;/p&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt; # sort on UID of UTG, or CCO, then begin position, then end
 # UTGs should only be unplaced surrogates
&lt;/pre&gt;&lt;/div&gt;


&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;-    open( BADMATES, "&lt;/th&gt;
&lt;th&gt;sort -k2.5,2 -k4,4n -k5,5n &amp;gt; $badMateFile" )&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;+    #open( BADMATES, "&lt;/td&gt;
&lt;td&gt;sort -k2.5,2 -k4,4n -k5,5n &amp;gt; $badMateFile" )&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+    #&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;die "Couldn't write to $badMateFile.";&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+    open( BADMATES, "&amp;gt;","$badMateFile.tmp" );&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;local $\ = "\n";&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;local $, = "\t";&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;while ( my ( $badId, $coord ) = each %bad_mates ) {&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;@@ -1764,6 +1765,8 @@&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;}&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;}&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;close BADMATES;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;+    system("sort --parallel=32 -k2.5,2 -k4,4n -k5,5n $badMateFile.tmp &amp;gt; $badMateFile");&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="codehilite"&gt;&lt;pre&gt; exit(0);
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;}&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">wangyugui</dc:creator><pubDate>Tue, 05 Jul 2016 11:03:48 -0000</pubDate><guid>https://sourceforge.netcece69856a2d5004b71735ba0d2ff8c19b578d3c</guid></item><item><title>#141 sort with multiple thread support to improve performance</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/141/?limit=25#db6e</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;and the sort in terminator will use 1279m or more time, so we need to improve the performance of it.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">wangyugui</dc:creator><pubDate>Tue, 05 Jul 2016 09:25:30 -0000</pubDate><guid>https://sourceforge.net504bd5508506e6ce0255922dafc58a4e7f03591a</guid></item><item><title>sort with multiple thread support to improve performance</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/141/</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;sort command is used in terminator, but it need multiple thread support to improve performance.&lt;/p&gt;
&lt;p&gt;sort command is used by 'XXXX|sort -k2.5,2 -k4,4n -k5,5n' in terminator.&lt;/p&gt;
&lt;p&gt;'sort -k2.5,2 -k4,4n -k5,5n ref.fa &amp;gt;x.fa' will have the default max 8 thread support.&lt;br/&gt;
'sort --parallel=32 -k2.5,2 -k4,4n -k5,5n ref.fa &amp;gt;x.fa' will have the default 32 thread support.&lt;/p&gt;
&lt;p&gt;but when used a pipe input, there will be no mutiple thread support in sort event with --parallel=32.&lt;br/&gt;
cat ref.fa |sort -k2.5,2 -k4,4n -k5,5n &amp;gt;x.fa      ==&amp;gt;single thread&lt;br/&gt;
cat ref.fa |sort --parallel=32  -k2.5,2 -k4,4n -k5,5n &amp;gt;x.fa      ==&amp;gt;single thread&lt;/p&gt;
&lt;p&gt;it will need a lit fix only, but I can't fix perl source.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">wangyugui</dc:creator><pubDate>Tue, 05 Jul 2016 09:22:29 -0000</pubDate><guid>https://sourceforge.netae37f2acfa586a74c0a012a9230e7e918a864287</guid></item><item><title>#139 runCA SLURM/DRMAA support</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/139/?limit=25#ac40</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Several users have had success with the SLURM patch on github so I would recommend this route for now. The only change I've seen users have to make was:&lt;br/&gt;
Changing&lt;br/&gt;
  setGlobal("gridEngineNameToJobIDCommand", "squeue -h -o\%F_* -n \"WAIT_TAG\" | uniq");&lt;br/&gt;
to &lt;br/&gt;
  setGlobal("gridEngineNameToJobIDCommand", "squeue -h -o\%F -n \"WAIT_TAG\" | uniq");&lt;/p&gt;
&lt;p&gt;That is, removing the * from the option.&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sergey Koren</dc:creator><pubDate>Mon, 16 Nov 2015 20:04:34 -0000</pubDate><guid>https://sourceforge.net9407d0ac6d487813563f166bb5cec64472c23e64</guid></item><item><title>#139 runCA SLURM/DRMAA support</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/139/?limit=25#82de/829d/2f77</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;Hi Sergey et al.,&lt;/p&gt;
&lt;p&gt;Noticed a tweet about this, and was first time for me hearing about SLURM, so thought would be fun exercise to look into modifying 'runCA.pl' to support (based on your suggestions).&lt;/p&gt;
&lt;p&gt;I've put together a patch script that I've checked into github here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/brettwhitty/bw-ca-tools/blob/master/runCA-slurm-patch/do_runCA_slurm_patch.sh" rel="nofollow"&gt;https://github.com/brettwhitty/bw-ca-tools/blob/master/runCA-slurm-patch/do_runCA_slurm_patch.sh&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;that adds the following SLURM-supporting variables:&lt;/p&gt;
&lt;p&gt;if (($var eq "gridEngine") &amp;amp;&amp;amp; ($val eq "SLURM")) {&lt;br/&gt;
+        setGlobal("gridEngineSubmitCommand",      "sbatch");                                      &lt;br/&gt;
+        setGlobal("gridEngineHoldOption",         "--depend=afterany:\"WAIT_TAG\"");              &lt;br/&gt;
+        setGlobal("gridEngineHoldOptionNoArray",  "--depend=afterany:\"WAIT_TAG\"");              &lt;br/&gt;
+        setGlobal("gridEngineSyncOption",         "");                                          ## TODO: SLURM may not support w/out wrapper; See LSF bsub manpage to compare&lt;br/&gt;
+        setGlobal("gridEngineNameOption",         "-D &lt;code&gt;pwd&lt;/code&gt; -J");                                 &lt;br/&gt;
+        setGlobal("gridEngineArrayOption",        "-a ARRAY_JOBS");                               &lt;br/&gt;
+        setGlobal("gridEngineArrayName",          "ARRAY_NAME[ARRAY_JOBS]");                    &lt;br/&gt;
+        setGlobal("gridEngineOutputOption",       "-o");                                        ## NB: SLURM default joins STDERR &amp;amp; STDOUT if no -e specified&lt;br/&gt;
+        setGlobal("gridEnginePropagateCommand",   "scontrol update job=\"WAIT_TAG\"");          ## TODO: manually verify this in all cases&lt;br/&gt;
+        setGlobal("gridEngineNameToJobIDCommand", "squeue -h -o\%F_* -n \"WAIT_TAG\" | uniq");  ## TODO: manually verify this in all cases&lt;br/&gt;
+        setGlobal("gridEngineNameToJobIDCommandNoArray", "squeue -h -o\%i -n \"WAIT_TAG\"");    ## TODO: manually verify this in all cases&lt;br/&gt;
+        setGlobal("gridEngineTaskID",             "SLURM_ARRAY_TASK_ID");                       &lt;br/&gt;
+        setGlobal("gridEngineArraySubmitID",      "%A_%a");                                   &lt;br/&gt;
+        setGlobal("gridEngineJobID",              "SLURM_JOB_ID");                             &lt;br/&gt;
+    }&lt;/p&gt;
&lt;p&gt;I initially worked from the SLURM man pages; n1/s/oge is old hat for me, so I started from the SGE (and PBS/torque) examples to map over the behavior the code seemed to be expecting. Then I set up a small SLURM virtual cluster and did a few test runs.&lt;/p&gt;
&lt;p&gt;Was away last few days and haven't gotten back to testing yet this week, but as far as I can tell it works OK. &lt;/p&gt;
&lt;p&gt;Array submissions seem to work, holds on array jobs seem to work, output file naming seems OK.&lt;/p&gt;
&lt;p&gt;Only thing that was a bit sketchy to me in the code is the nested variable replacement that happens with 'WAIT_TAG', especially as it relates to 'gridEnginePropagateCommand'; the thinking wasn't clear to me with the naming of that variable, but following through the code I think everything does seem to work as it should; but I haven't tested enough to understand the special cases that may be buried in those deeply nested code blocks.&lt;/p&gt;
&lt;p&gt;Would be happy to polish this off should anyone have any feedback or errors from their own testing, but otherwise after I do a couple more tests and am satisfied I will consider it a completed exercise. Hope this is useful to a few people.&lt;/p&gt;
&lt;p&gt;Regards,&lt;/p&gt;
&lt;p&gt;Brett&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Brett Whitty</dc:creator><pubDate>Tue, 18 Aug 2015 23:38:49 -0000</pubDate><guid>https://sourceforge.netb06bcdb7c2627e85dc309627e10df4c28aabf54c</guid></item><item><title>#139 runCA SLURM/DRMAA support</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/139/?limit=25#a504</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I have an opportunity to apply for resources (in the form of a competent person's time) to do work towards this. Should I go ahead or has somebody already started?&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">lexnederbragt</dc:creator><pubDate>Wed, 12 Aug 2015 11:22:29 -0000</pubDate><guid>https://sourceforge.net114bd36ce551e10f5342c17c5277266f78a5a589</guid></item><item><title>#139 runCA SLURM/DRMAA support</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/139/?limit=25#82de</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;I am also interested in runCA with SLURM. Any news? &lt;br /&gt;
Thanks&lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">miquelgralo</dc:creator><pubDate>Thu, 02 Jul 2015 04:51:43 -0000</pubDate><guid>https://sourceforge.net6b9fa392b049e6efa458aa81c88d9b71298b0af7</guid></item><item><title>#140 extreme disk usage issue</title><link>https://sourceforge.net/p/wgs-assembler/feature-requests/140/?limit=25#b2cf</link><description>&lt;div class="markdown_content"&gt;&lt;p&gt;First, you can definitely erase 0-overlaptrim/*Store folders to free up space since you're done with trimming. There are potential changes you can make to decrease the space used but you would have to re-start the assembly. You can see how close to completing the overlap store building you are by checking the asm.ovlStore.err file.&lt;/p&gt;
&lt;p&gt;The most likely cause of a large overlap store are repeats in the sequences. There was a recent question on the user group about a large overlap store for Illumina data. I'm paraphrasing most of the response below.&lt;/p&gt;
&lt;p&gt;You can drop shorter reads.  The historical minimum is 64 bases, but you can set it higher depending on your sequence lengths. &lt;/p&gt;
&lt;p&gt;In addition to throwing out short reads, definitely increase the minimum overlap size (ovlMinLen) to whatever the length of the shortest read is, -1 (or 2 or ...).&lt;/p&gt;
&lt;p&gt;What kmer threshold did it pick (0-mercounts, one of the *err files)?  Can you send the histogram file? Plotting the first two columns should show a definite hump at the expected coverage, with a large tail.  Any humps after that are repeats that probably should be excluded from seeding overlaps.  Be sure to check way out on the X axis, with Y zoomed in, for any very common repeats.&lt;/p&gt;
&lt;p&gt;Do you have a (cumulative) histogram of read lengths?&lt;/p&gt;
&lt;p&gt;Any chance there is adapter present?&lt;/p&gt;
&lt;p&gt;The most recent versions of CA have been optimized for PacBio (long) sequences which makes the data structures take more space for the short Illumina reads. &lt;/p&gt;&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sergey Koren</dc:creator><pubDate>Mon, 18 May 2015 01:37:03 -0000</pubDate><guid>https://sourceforge.netf152732c57f266135800174444a442ea66457fe1</guid></item></channel></rss>