countNucs(s::String) = [
count(==('A'), s), count(==('C'), s), count(==('G'), s), count(==('T'), s)
]
function main()
if length(ARGS) < 2
println("Usage: julia $(Base.PROGRAM_FILE) <fileIN> <fileOUT>")
exit(1)
end
= read(ARGS[1], String)
s = countNucs(s)
data open(ARGS[2], "w") do file
println(file, join(data, " "))
end
exit(0)
end
main()
Counting DNA nucleotides
Problem definition
Given: A DNA string s of length \(1000\) nucleotides
Return: Four integers (seperated by spaces) counting the respoective number of times that the symbols ‘A’ ‘C’ ‘G’ ‘T’ occur in s.
Sample Dataset:
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
Sample Output:
\[\begin{matrix} 20 & 12 & 17 & 21 \end{matrix}\]Solution
We use the inbuilt julia function count(p, iter)
which counts the number of elements in iter
for which predicate p
is true.
Notes
In Rosalind, you have to ‘download the dataset’ and they give you five minuets to submit your answer. This essentially means that your solution nees to be reasonably computationally efficient to solve within that time frame. If you write bad code, sometimes this is a non-trivial requirement because their data files for testing contain a large number of data.
The cold was ran by downloading the dataset, then running the following command:
julia countdna.jl rosalind_dna.txt output.txt