Assignment 1 2025
General
General
You must read fully and carefully the assignment specification and instructions.
Course: COMP20007 Design of Algorithms @ Semester 1, 2025
Deadline Submission: Monday, 7th April @ 11:59 pm
Course Weight: 10%
Assignment type: individual
ILOs covered: 2, 3, 4
Submission method: via ED
Purpose
The purpose of this assignment is for you to:
Design efficient algorithms in pseudocode.
Improve your proficiency in C programming and your dexterity with dynamic memory
allocation.
Demonstrate understanding of a representing problems as graphs and implementing a set of
algorithms.
Birds of a feather
Why should we group species?
In the field of genetics we are often interested in grouping similar species together to understand how
animals have evolved over time. Understanding evolution allows us to make more informed
decisions in conservation efforts, but may also lead to the discovery of new medicines, as similar
species produce similar responses to external changes. One possible method of grouping species
together is by how similar they are, which historically has been based on how similar their physical
features are. Some examples of Australian birds are shown below:
Left: Wompoo Fruit-Dove, Middle: Azure Kingfisher, Right: Pale-Yellow Robin
Now, one possible grouping of these three birds is to have them grouped by size - we would expect
small birds like the Pale-Yellow Robin or Azure Kingfisher to have more in common than the larger
Wompoo Fruit-Dove. Of course, we can also group them based on more features, using an
algorithmic approach. Examples of such algorithms are Blake's averages and Cynthia's midpoints.
Other Applications
Grouping methods have numerous real-world applications, which include:
Disease Management: Tracking information about the spread of sickness and grouping
together individuals with similar health problems can help us to identify high-risk population
areas and take preventative steps to save lives.
Resource Allocation: Identifying groups of animals with high/low resource usage can help
better distribute resources and help conservation efforts.
Content Personalisation: Grouping together users with similar tastes can help individuals
access the content and information they want more often.
Group Computation Algorithms
Blake's Averages
Initialization:
Start by selecting the first c birds as the centres of the c groupings desired.
Assign all other birds to their nearest group centre.
Calculate Averages:
For each group, find the average of all the features.
For each group, find the bird that is closest to this average.
Re-assign all birds to the new closest groups.
Termination:
Repeat the main loop until the group centres stop changing, or we exceed a maximum
number of iterations.
Once the centres stop changing, the algorithm terminates.
Output:
The output of the algorithm is the list of birds with their features and group
memberships, ordered alphabetically.
The time complexity of the Blake's Averages algorithm is where is the number
of data points, is the number of features per point, is the desired number of clusters, and is the
number of iterations. As should not vary that much and effectively be constant, we will have a
complexity of roughly .
O(n d c i) n
d c i
i
O(n c d)
Pseudocode
BlakeAverages(birds, numBirds, numGroups):
// inputs:
// birds, a list of birds.
// numBirds, the number of birds.
// numGroups, the number of groups.
// Initialise the groups by, for the first c birds, assigning bird i to group i.
birds <- initialiseBirds()
// Assign each bird to the nearest group centre
birds <- assignGroups()
repeat:
// record the current group centres
previousCentres <- currentCentres
// calculate the mean of each group and find the birds which act as the new group centres
currentCentres <- calculateMeans()
// Assign each bird to the nearest group centre
birds <- assignGroups()
until previousCentres == currentCentres or we exceed the maximum iteration count
// all done, so return grouping
return birds
Cynthia's Midpoints
Initialization:
Start by selecting the first c birds as the centres of the c groupings desired.
Assign all other birds to their nearest group centre.
Calculate Midpoints:
For each group, find the median of the numeric features.
For each group, find the mode of the categorical features.
For each group, find the bird that is closest to this combination of midpoints.
Re-assign all birds to the new closest groups.
Termination:
Repeat the main loop until the group centres stop changing, or we exceed a maximum
number of iterations.
Once the centres stop changing, the algorithm terminates.
Output:
The output of the algorithm is the list of birds with their features and group
memberships, ordered alphabetically.
As we will sort the elements every time we need to find a median, the time complexity of the
Cynthia's Midpoints algorithm is ,
where is the number of data points, is the number of features per point, is the desired number
of clusters, and is the number of iterations. As should not vary that much and effectively be
constant, we will have a complexity of roughly .
Pseudocode
CynthiaMidpoints(birds, numBirds, numGroups):
// inputs:
// birds, a list of birds.
// numBirds, the number of birds.
// numGroups, the number of groups.
// Initialise the groups by, for the first c birds, assigning bird i to group i.
birds <- initialiseBirds()
// Assign each bird to the nearest group centre
birds <- assignGroups()
repeat:
// record the current group centres
previousCentres <- currentCentres
// calculate the midpoints of each group and find the birds which act as the new group cent
// For each group
// Find the most common colour
// Find the median weight
// Find the median bodyLength
currentCentres <- calculateMeds()
// Assign each bird to the nearest group centre
birds <- assignGroups()
until previousCentres == currentCentres or we exceed the maximum iteration count
// all done, so return grouping
return birds
Notes
Your main function should look like the following pseudocode:
main():
// Initialise an empty list to store birds, and group centres
birds <- empty list
// Read in the list of birds and number of groups
birds, numBirds, numGroups <- readBirds()
if no birds input:
print "No birds input."
return 1
if number of birds < number of groups:
print "Invalid Data."
return 1
// using the method read from argv:
if method == "B":
birds <- BlakeAverages()
else if method == "C":
birds <- CynthiaMidpoints()
else:
print "Invalid method."
// Print out the list of birds and their group assignments
print(birds)
// free all of the data
free(birds)
The process of grouping can be visualized in two dimensions, and may look like the following.
In the image, different colours represent the different groups.
Example
An example of input and output is provided on the Part 1 Skeleton Slide.
Task 1: Group Computation
Part A
Implement the Blake's Averages and Cynthia's Midpoints algorithms to compute a grouping of
birds, as described in the previous slide.
Requirements
Code: Your program should implement the required pseudocode mentioned in previous slides,
as well as a couple helper functions listed in the skeleton code.
Input Format: The input will be a text file where the first line indicates the total number of
Australian birds, and the total number of groups. Subsequent lines represent birds with their
name, colour, weight (grams) and body length (cm) separated by a space (e.g., Red?backed_Fairywren Black 5 10 ). All the birds are sorted in alphabetical order by name using
Unix sorting order.
Output Format: Your program should output all birds, their features and associated groupings
in alphabetical order. For Blake's Averages, this means keeping it in the original ordering (Unix
sorting rules), and in Cynthia's midpoints it will be alphabetically sorted using strcmp sorting
rules. This should be done by traversing your array of birds and printing out each bird. This
should be output to the console (stdout).
Part B
Evaluate both algorithms through experimental analysis by quantifying the average total basic
operations per iteration of the main loop (calculated by dividing the number of operations in the
main loop by the number of iterations taken) of the two algorithms (Blake's Averages and Cynthia's
Midpoints) across various input scales and configurations. Use all of the valid input sets provided.
You may wish to generated more data, and you can do so with the provided files in the analysis
folder.
You must decide what to define as the basic operation for each algorithm, but remember to only
count the basic operations in the main while loop.
Reporting: Write a report including a discussion on the choice of algorithm, the experimental
evaluation (including tables or graphs showing how the average number of basic operations varies
with the input parameters (n and d)), and conclusions drawn from the comparisons. Include any
assumptions or simplifications made in your implementations. In addition, discuss:
Possible improvements that can be made to the main loop of the algorithms, if any, to reduce
complexity.
Why you selected this basic operation.
Tip: The global variable numOps is provided to track the operation count. You may need to modify some of
the functions (such as the comparison functions) to increment the operation counter. By default, the
information relating to operation counts is printed to stderr.
Include your operation counting code as an appendix to the report.
Submission Guidelines
Submit your C source code files with appropriate comments explaining the algorithms and data
structures used.
Your report should be in PDF format, including your findings from the experimental evaluation
and any observations and theoretical improvements regarding the performance of the two
algorithms.
The report for Part 1B should be submitted as a PDF file named written_task_1B.pdf . The file
should be uploaded to the home directory of Part 1A, that is, the directory that contains your
program file birds.c .
Grading Criteria
Correctness of the implemented algorithms and adherence to the requirements.
Efficiency (time and space) and proper storage of birds.
Clarity of the report, including the depth of the experimental evaluation and the analysis of the
results.
Code readability, structure, and documentation.
Task 1 Skeleton
Example of Input:
10 3
Australasian_Swamphen Blue 1310 51
Australian_Bushturkey Black 2100 64.3
Australian_Darter Black 2600 86.5
Australian_King-Parrot Red 195 42
Australian_Logrunner Black 56 19
Australian_Magpie Black 350 85
Australian_Pelican White 6800 188
Australian_Rufous_Fantail Grey 10 18.5
Australian_White_Ibis White 1475 66.4
Australian_Wood_Duck Brown 955 46
Example of Output (for Blake's Averages):
Australasian_Swamphen Blue 1310.000000 51.000000 Group: 1
Australian_Bushturkey Black 2100.000000 64.300000 Group: 2
Australian_Darter Black 2600.000000 86.500000 Group: 2
Australian_King-Parrot Red 195.000000 42.000000 Group: 0
Australian_Logrunner Black 56.000000 19.000000 Group: 0
Australian_Magpie Black 350.000000 85.000000 Group: 0
Australian_Pelican White 6800.000000 188.000000 Group: 2
Australian_Rufous_Fantail Grey 10.000000 18.500000 Group: 0
Australian_White_Ibis White 1475.000000 66.400000 Group: 1
Australian_Wood_Duck Brown 955.000000 46.000000 Group: 1
Example of Output (for Cynthia's Midpoints):
Australasian_Swamphen Blue 1310.000000 51.000000 Group: 1
Australian_Bushturkey Black 2100.000000 64.300000 Group: 1
Australian_Darter Black 2600.000000 86.500000 Group: 1
Australian_King-Parrot Red 195.000000 42.000000 Group: 0
Australian_Logrunner Black 56.000000 19.000000 Group: 0
Australian_Magpie Black 350.000000 85.000000 Group: 0
Australian_Pelican White 6800.000000 188.000000 Group: 2
Australian_Rufous_Fantail Grey 10.000000 18.500000 Group: 0
Australian_White_Ibis White 1475.000000 66.400000 Group: 1
Australian_Wood_Duck Brown 955.000000 46.000000 Group: 0
Eels in the Kulin Nation
Short-finned eels are a fish which live in the freshwater systems around south-eastern Australia. To
the First Peoples of the Kulin Nation - the traditional custodians of the lands and waters surrounding
what is now Melbourne, or Naarm (which is the Boonwurrung/Woiwurrung name for Port Phillip) -
these eels were very important food sources. The Wurundjeri people of the Kulin Nation had seven
seasons rather than the Western four, and one of the seasons was dedicated to the short-finned eel
migration, this season is known as Iuk (https://inspiringvictoria.org.au/2020/08/13/seasons-in-the?sky/). During Iuk, the short-finned eels which live in the freshwater systems of the Kulin Nation
migrate out to the ocean to begin their long journey to the warm waters of the Coral Sea, some
3,000km away, to breed. However, before setting out for this extensive journey the eels must eat and
get fat to survive the long swim. Hence, the people of the Kulin Nation would make extensive fish
traps in the river systems to catch and eat the fattened eels during Iuk, which have been described as
having a buttery taste by a Yorta Yorta person.
Feeding and breeding eels
You are a fresh water eel in the river systems of the Kulin Nation. There are many rivers which
connect different lakes together, and eventually to the ocean.
Part A
You find that these river systems are difficult to navigate, and you wonder if you are running in
circles. Write an algorithm to tell if there are any paths in this river system which could form a cycle,
i.e. leaving from one lake you could take a certain path of distinct rivers that reaches back to the
starting lake.
This algorithm should work for any set of lakes and rivers. There will be a certain number of lakes,
each with a unique identifier lakeID [0, numLakes). Each river will run from one lake to
another, but you may assume that rivers can be travelled in both directions.
The first line of input is the number of lakes, and number of rivers in the system respectively. The
subsequent lines will each represent a river, with the first value being the lakeID it flows from and the
second being the lakeID it flows to. The input will look like:
[num_lakes] [num_rivers]
[from_lakeID] [to_lakeID]
...
The output of the program should print "We're running in circles!" if there is a cycle found, and
"Smooth sailing" if not.
Part B
You are very hungry and have heard of some good feeding grounds further inland, but it is a long
way and you can't figure out what's the best way to go given all the different rivers and lakes. You
want to get there as fast as possible before the food is all eaten up. The amount of time taken to
traverse a river is equal to the river's length. Unfortunately, because of a strong current, it takes
twice as long to swim upstream as it does to swim downstream.
Write an algorithm that will find the shortest way to reach the feeding grounds from the ocean.
This algorithm should work for any set of lakes and rivers. As with Part 1, there will be a certain
number of lakes, each with a unique identifier lakeID [0, numLakes). Each river will run from
one lake to another and have an associated length with it as well.
The first line of the input will contain the lakeID which you are starting from, followed by the lakeID
of the destination lake where the feeding grounds are. The second line contains the number of lakes
and number of rivers in the system. The subsequent lines each contain a river, with the lakeID of the
lake it flows from followed by the lakeID of the lake it flows into, and then the final value is the river's
length.
The input will look like:
[origin_lake] [destination_lake]
[num_lakes] [num_rivers]
[from_lakeID] [to_lakeID] [length_in_km]
...
The output of the program should print the total length of the shortest path to the feeding ground,
followed by the lakeIDs of the lakes traversed to reach it (i.e. by going from lake to lake through the
rivers).
The output should look like:
Total cost: [total_length]
Path: [origin_lakeID], [lakeID_1], [lakeID_2], ..., [destination_lakeID]
Part C
The breeding season (Iuk) is fast approaching, so you need to make your way back to the ocean and
onto the Coral Sea. However, you want to maximise the amount of fat you have by the time you
reach the ocean. Swimming down rivers costs energy (and burns fat), but in many cases, you need to
swim through some rivers and lakes anyway to get to the sea. Moreover, the lakes tend to have some
food in them, which can increase your fat supplies again. Assume that you only travel the rivers
downstream because you want to reach the sea as quickly as possible.
Propose an algorithm to find the best way to get back to the ocean, ensuring that, in total, you have
the maximum fat stores upon reaching the ocean. This algorithm should run in O((V +
E)log(V )). This algorithm may only work with certain sets of rivers and lakes, and it is up to you to
check whether the input will be solvable in this time complexity as part of your algorithm.
Write the pseudocode for this algorithm. You may assume the following:
The input graph is a directed weighted graph. Each edge (u, v, w) is represented as a single
element in the adjacency list for u.
There is a function Dijkstra(graph, origin, destination) (cost, path) that returns the
cost and the path of the lowest-cost path from origin to destination.
As part of the input data, you have an array fatGain[0..numLakes ? 1] where fatGain[i]
stores the amount of fat (in some units) you always gain when you reach the lake with ID i.
When you swim downstream along a river of length w, you lose exactly w units of fat.
At the start of your journey back to the sea, you have K units of fat in your body. You will die
when the number of fat units in your body reaches 0.
You must justify your algorithm design choices in 300 words, including why certain sets of rivers and
lakes won't work with the algorithm.
Notes:
No marks will be awarded if your algorithm's time complexity is not O((V + E)log(V )), or if
your algorithm is incorrect for the problem.
The report for Part 2C should be submitted as a PDF file named written_task_2C.pdf . The file
should be uploaded to the home directory of Part 2B, the directory that contains your program
files dijkstra.c , graph.c , etc.
Task 2, Part A Skeleton
Implement a function, cycleCheck(graph_t *graph), which returns 1 if a cycle is found, and 0
otherwise.
The main driver functions have already been implemented.
The first line of input is the number of lakes, and number of rivers in the system respectively. The
subsequent lines will each represent a river, with the first value being the lakeID it flows from and the
second being the lakeID it flows to. The input will look like:
[num_lakes] [num_rivers]
[from_lakeID] [to_lakeID]
...
The output of the program should print "We're running in circles!" if there is a cycle found, and
"Smooth sailing" if not.
Input Data Sets
A number of input data files are provided in the subdirectory test_cases . Each file name starts with
the prefix t2a- . The data file t2a-0.txt represents a simplified version of some lakes and water
flows (like rivers, creeks, canals) between them for the area around Lakes Entrance in Victoria. It
should be noted that this area does not belong to the Kulin Nation (it actually belongs to the
Gunaikurnai people). The area was chosen here for illustrative purposes.
Also note that t2b-0.txt is the same as t2a-0.txt , but which additional data for using in Part B.
Task 2, Part B Skeleton
Write a function dijkstra(graph_t *graph, int origin, int dest, int *path) which computes
the shortest path from origin to dest and returns the cost of this path, and the path should be
written into the path argument. This function should return the SENTINEL value (-1) if there is no
path, and print out "No Path".
The input data will look like:
[origin_lake] [destination_lake]
[num_lakes] [num_rivers]
[from_lakeID] [to_lakeID] [length_in_km]
...
The output should look like:
Total cost: [total_length]
Path: [origin_lakeID], [lakeID_1], [lakeID_2], ..., [destination_lakeID]
Most scaffolding functions have been written for you, including I/O and a basic graph and priority
queue implementation. We recommend you use these, but you are welcome to write your own if you
wish, you must ensure that the output is in the same form of the output the test cases give.
Task 2B Test Cases
1 Automatic Zoom
Academic Honesty
This is an individual assignment. The work must be your own work.
While you may discuss your program development, coding problems and experimentation with your
classmates, you must not share files, as doing this without proper attribution is considered
plagiarism.
If you have borrowed ideas or taken inspiration from code and you are in doubt about whether it is
plagiarism, provide a comment highlighting where you got that inspiration.
If you refer to published work in the discussion of your experiments, be sure to include a citation to
the publication or the web link.
Borrowing of someone else s code without acknowledgment is plagiarism. Plagiarism is considered
a serious offense at the University of Melbourne. You should read the University code on Academic
integrity and details on plagiarism. Make sure you are not plagiarizing, intentionally or
unintentionally.
You are also advised that there will be a C programming component (on paper, not on a computer) in
the final examination. Students who do not program their own assignments will be at a disadvantage
for this part of the examination.
Late Policy
The late penalty is 20% of the available marks for that project for each working day (or part thereof)
overdue.
If you wish to apply for an extension, please review the FEIT Extensions and Special consideration
page on the subject LMS. Requests for extensions on medical grounds will need to be supported by a
medical certificate. Any request received less than 48 hours before the assessment date (or after the
date!) will generally not be accepted except in the most extreme circumstances. In general, extensions
will not be granted if the interruption covers less than 10% of the project duration. Remember that
departmental servers are often heavily loaded near project deadlines, and unexpected outages can
occur; these will not be considered as grounds for an extension.
Students who experience difficulties due to personal circumstances are encouraged to make use of
the appropriate University student support services, and to contact the lecturer, at the earliest
opportunity.
Finally, we are here to help! Frequently asked questions about the project will be answered on Ed.
Requirements: C Programming
The following implementation requirements must be adhered to:
You must write your implementation in the C programming language.
Your code should be easily extensible to multiple data structure instances. This means that the
functions for interacting with your data structures should take as arguments not only the values
required to perform the operation required, but also a pointer to a particular data structure, e.g.
search(dictionary, value) .
Your implementation must read the input file once only.
Your program should store strings in a space-efficient manner. If you are using malloc() to
create the space for a string, remember to allow space for the final end of string character, \0
( NULL ).
Your approach should be reasonably time efficient.
Your solution should begin from the provided scaffold.
Hints:
? If you haven t used make before, try it on simple programs first. If it doesn t work, read the error messages
carefully. A common problem in compiling multifile executables is in the included header files. Note also that
the whitespace before the command is a tab, and not multiple spaces.
? It is not a good idea to code your program as a single file and then try to break it down into multiple files.
Start by using multiple files, with minimal content, and make sure they are communicating with each other
before starting more serious coding.
Programming Style
Below is a style guide which assignments are evaluated against. For this subject, the 80 character
limit is a guideline rather than a rule if your code exceeds this limit, you should consider whether
your code would be more readable if you instead rearranged it.
/** ***********************
* C Programming Style for Engineering Computation
* Definitions and includes
* Definitions are in UPPER_CASE
* Includes go before definitions
* Space between includes, definitions and the main function.
* Use definitions for any constants in your program, do not just write them
* in.
*
* Tabs may be set to 4-spaces or 8-spaces, depending on your editor. The code
* Below is ``gnu'' style. If your editor has ``bsd'' it will follow the 8-space
* style. Both are very standard.
* We should not comment obvious things - write code that documents itself
Some automatic evaluations of your code style may be performed where they are reliable. As
determining whether these style-related issues are occurring sometimes involves non-trivial (and
sometimes even undecidable) calculations, a simpler and more error-prone (but highly successful)
solution is used. You may need to add a comment to identify these cases, so check any failing test
outputs for instructions on how to resolve incorrectly flagged issues.
Mark Breakdown
There are a total of 10 marks given for this assignment.
Your C programs for Task 1 and 2 should be accurate, readable, and observe good C programming
structure, safety and style, including documentation. Safety refers to checking whether opening a file
returns something, whether mallocs do their job, etc. The documentation should explain all major
design decisions, and should be formatted so that it does not interfere with reading the code. As
much as possible, try to make your code self-documenting, by choosing descriptive variable names.
The remainder of the marks will be based on the correct functioning of your submission.
Note that marks related to the correctness of your code will be based on passing various tests. If your
program passes these tests without addressing the learning outcomes (e.g. if you fully hard-code
solutions or otherwise deliberately exploit the test cases), you may receive less marks than is
suggested but your code marks will otherwise be determined by test cases. For questions with both a
written component and a C code component, part of the mark will be given for the passing of test
cases, with the remainder from the correctness of the written answer.
Task 1 will be marked out of 4 marks, Task 2 will be marked out of 5 marks and C code quality will
comprise the final mark.
Additional Support
Your tutors will be available to help with your assignment during the scheduled workshop times.
Questions related to the assignment may be posted on the Ed discussion forum, using the folder tag
Assignments for new posts. You should feel free to answer other students questions if you are
confident of your skills.
A tutor will check the discussion forum regularly, and answer some questions, but be aware that for
some questions you will just need to use your judgment and document your thinking.
If you have questions about your code specifically which you feel would reveal too much of the
assignment, feel free to post a private question on the discussion forum.
Most students find Academic Skills' Research Report Guide extremely valuable in constructing a well
formed and sensible analysis that makes good use of relevant material taught so far in the subject.
Acknowledgements
ChatGPT was used to help generate some graphs for Task 2A.
請加QQ:99515681 郵箱:
[email protected] WX:codinghelp