Yeah, excellent price. I'm good morning, everyone. Sorry for the slight delay. People were a little bit slightly coming into groups. I do is leave you that Romney in, maybe, hopefully you're all okay this morning, looking forward to the day of plotting. And I felt like I'm only plotting. And sorry. Yeah, So this session is intended to be quite useful to you or that side. So ultimately, as chemists, you tend to make a lot of graphs. There's various ways of doing that. I guess you'd use something like Excel before. Many of you may also use something like origin. An alternative is to use something like Python. So actually, for myself, this was how I got into Python programming, was that I saw somebody else is Python graph and I thought that looks nice. I'd like to make one that looks like that. And then a few years later, AVL. So hopefully you'll find this session interesting. Hopefully, it goes into a bit of detail and biases, but hopefully it will give you enough tools that later on in your career and your degree, you can use some of those tools to make plots if you want. So I guess we should probably just get started. So if my shameless, just find him again. So I'm on a different computer. Difficult to judge how large things are. Can you give me an idea of if this is a reasonable size? Let's do the green ticks and Red Cross is where does it go. So if it's a suitable size and you can see what's going on, can you give me a green take place? Okay. That's not good. Okay. With that. And when you get into Jupiter, appears to small, we'll have to tell me, okay. Right. So you should be experts is down. So you guys Thanksgiving chemistry course materials. This time we did in section 5. So we're halfway through the course MCU Blade. And click on that. It'll take you three to Nashville. Start that up. And then if you haven't already, so reminder to go back to learn, get your link to the course materials. So let's say we use all the time with a Git repo bucket and climb and climb the newest version. Last week. Sorry, it should be fine. But just because it gets, so you should have plotting student notebook. Here we go. So now look something like that. It's going to be a similar format to last week, so you'll know but contains detailed notes and sort of extra bits of information and references and things. I'm going to work from a slightly different version. I will address that question in a minute. So I'm going to get from a 2D version here. And this one was just trip down some sense of the information. I should also add in terms of last week session, you should now have access to the answers to the problem-solving when say, you say that the original problem-solving student, but you've now got problem-solving demonstrating, right? But that's basically the stoop. But with Beyonce's and the missing bits will then say, Oh, Shane, you said page is not clear. Is it too small? Or just blurry? Blurry for you to know that maybe the university things now I'm afraid I'm not sure I can do much about that one. Hopefully, when we get into the Jupyter Notebook and things settled down a bit, it will be clearer, but if it's still a problem, we need to make things bigger to compensate, then let me know. Okay. I is difficult as we constantly changing thing. Yeah, please. Hello is difficult one, I'm not really sure I can solve for it. Okay, So just before we get started, then there was a couple of questions about the assignments for them. So assessment to remove that from the list. With the hope of trying to actually reduce the number of assessments that appear in notable in the Assignments tab is getting a bit ridiculous now that you've got up to 40 assignments. So we remove the assessment to try and narrow that down a bit. Unfortunately, we haven't realised that that will then remove it from your downloaded it submitted assessments list. It wasn't clear documentation. And so talk holidays working unless she I think is planning either die or tomorrow to actually rereleased the assessments, I should just pop up as it was before and also to release the feedback to you. So now all the extensions that people have had to come to an enzyme and been able to work. We're trying to get the feedback to you as quickly as possible. So you should hopefully get some feedback this week. Hopefully that answers. So, yeah, it'll be good if we could group, unfortunately become this is one of the many limitations on notebook. I'm afraid that we're stuck with this system of having 10 different assessments that you're going to say. I'm sorry, we can't make that any any other questions about the assessments? I'll be talking about him. No. Okay. Then in which case let's get started with some plotting. So hopefully you can all see that, okay? Green ticks, red crosses. I have lots of green ticks from demonstrators, not much for anyone else to do. I see no Red Cross is we'll go with that. Okay. So plotting then FST, he's just do some. So again in your notebook, go through and run the sounds in order so things work. So we need to import various libraries. So we've seen these before. Canada is not by Mendeleev's periodic table data, which we'll use later. And the other stuff is to make multimeters book. So we'll run that. And then by way of an introduction, I just wanted to give you this graph. Oh, okay. There's a question about the precession quiz. If you forget to submit, it, just means that you've submitted it. It doesn't count towards your assessment to do. It's purely a way of helping your own learning. So it should still be able be available for you. Just e-mail after this session if you want. Okay. So yeah, just get started then this this picture here. I wonder if anyone knows who made them feel free to either shouts out or put it in the chat. It doesn't ago and it's very old, 856. But while 1857, well done Paita, you get a gold star or something. So it was Florence Nightingale. There everyone nice. Florence Nightingale as the Lady with the Lamp and we'll work on getting the hospitals clean during various was a long time ago. But actually she was very keen statistician and respond to that. She served as part of her work. She went into the war zones and she did lots of data collecting and seeing who have done, who had died from what trying to pin down to new information there. Edging produce these absolute, I think that absolutely beautiful graphs showing how the variation of mortality is changing with the month of the year. This is just a really nice ways of showing the data. So you've got a time sequence that's going from July to June. And the size of the bar is proportional to how many people are dying. And the colour is heavily related to what causes all the details down in his kid wasn't I don't expect you to read it, but I just wanted to put it up as a thing that says, actually can include an awful lot of information in a graph. Say, Hey, you've got timing information. You've got information on mortality from different causes. And you've also got a relationship between years. Now this year, as you can link that to other things. So it's a really good graph is worth an awful lot of words. And this is why we use the scientist is that it's a very, very efficient way of expressing information. So plotting is a great Wednesday, but you can very easily get plots wrong. So that example was a very nice profit. You've probably seen really bad prompts, make much sense. Any information. So there's some really important things to consider when you pass it. And that is, first of all, what do you want me to sign? So Florence Nightingale, florence Nightingale example. She was trying to present how many people are dying at Yale y. So that, that comes across very clearly. The important thing is that your graph has to show me what you want it to. There's no point just plotting the data in some haphazard by an hoping that somebody's going to get what you wanted to say. You need to think about. Unrelated to that is this idea of choosing the correct type of plot. So if you're using the Rendezvous Club, then you're never going to express your information in a sensible way. Once you've chosen what your message is, what your plot type is going to be. Dante's case, I'm trying to narrow down. Matt plot and make it as clear as possible so that somebody just has to look at it for a couple of seconds and they get the sense that you're trying to express. So I really important thing graphs is to keep it simple. And this idea of sharp joke, like something you might come across before. So if you've used Excel, do the same job done. Because XL put all sorts of lines on your graph and it puts all the colours and things you don't need that it doesn't add any information to your graphs are important things. If it's not adding information or it's not helping understand, it shouldn't be there. And it's actually some of the best routes are the ones that are really very stocks and just add data. I'm very little else. Here we go. So just thinking about choosing a plot, type them. So this is one of the really big considerations. There's been a wonderful time and start this session talking about it. This is dropped off the bottom of my screen. I'm afraid I wouldn't change it because you don't have driving. So hopefully in your notebook you can see that. And this is just the sort of, It's really nice diagram but it's not comprehensive. But it gives you a way of thinking about what sorts of ways we can express data. So if you think about the data that you might have, it could be numerical, it could be a distribution. So say the ageism, everyone in these clouds. Then you'd be thinking about to plot some sort of distribution. And how would we do that? Whether you are looking at one or two variables, It's claim for offering. So if it's just a single variable, the standard way that you look at somebody like a histogram. So you can take your data of ages and you group them up into different bins. We say, How many think is it, how many data points lie in each payment? If you have loads and loads and loads of data. So I make a histogram. It worked really well for the ages within the class. If you were looking at the ages within the population of a country, and histograms get very complicated quite quickly. You might actually be better for somebody like a line histogram. Various ways of doing this. You can do something that's called a density estimation, which gives you a nice smooth curve. But it's a nice way of summarising things, an enormous continuous way. If you then want to look at two variables. So say you wanted to correlate the age of everyone in the class with shoe size, then you got to numerical value. Say something like a straight-up trouble would be quite useful because you probably are 81 direction. Choose eyes and another, and you see if as a relation and all prompted a stacked graph at some point, I think I managed to write that again. Hopefully you can still say that. So pose of distributions of data. Say we wanted to compare data of ECMO. So if we're looking at, we said about respect to jump for two variables, That's quite a useful way of saying relationship. If you then introduce three variables, they get quite complicated. So how juicy? See if there is a correlation between three different sets of numbers the same time. So one thing to bear, a 3D area, Charles, and that's what you're trying to plot, some sort of surface that gets really complicated pretty quickly. Why a nice way that you may have seen before is this sort of WHO or something related. That's where it's effectively a scatter plot. But the size or the colour of the points in some way dependence on the third dimension of Beta. So it's not easy to necessarily understand straight to buy it, but will the information is there. So these are the sort of numerical things. Often we deal with data that all numerical there. So if we think about, again, going back to the age of people in the class, you could also look at gender, male, female. That isn't numeric value. You can plot it on a numerical scale, but you can split it up into two separate bones. So this is where something like a balance, challenge or colouring sharp compensate when you have a ball for male and female. I'm plotting the ages that it doesn't work particularly well because how do you say, wow. So you might actually prefer to do some sort of what's called a violin plot. We're looking at effectively a histogram for each category. Say splitting a categorical variables, numerical variables. And there's all sorts of other ones on here. I'm not gonna go into great detail about most of them are lots of these you will never use. I don't think I've ever in my entire life use the board for chip offerings, but they do exist. But the one thing I will just say, pie charts generally frowned upon. I think this is my view of them, at least within science. So people like economists love them. Scientists tend to avoid pie charts because they don't express statutory law, particularly. Comparable way. So you can say you had numerical data of how many people in the class had blue eyes and brown eyes. You could plot that as a pie chart and that'd be great. You could also plot it as a column gel. So you have a column for blue, green, burrito, etc. And what you find is that actually the columns much easier to compare the heights so people can look at a column, Jones, Aye. Okay. There's twice as many people with brown eyes, blue eyes, whereas a tiny child, it's much harder to estimate the fraction of the overall, which may be surprising, but that seems to be what the psychology behind you chose. So how to avoid pie charts, unless you are really sure that's the way to go with it. So this idea of numerical data, you're looking at either a histogram or a scatter plot. When you stumble get into three variables, as I touched on, it gets quite complicated. So you might want to think of different ways. Beauty. Perhaps you could split it out into multiple, two category or two variable costs. And then with categorical things, then you got some common graphs. You can start to stack carbon groups, which we'll see later. And three, comparing three or more categories and really difficult. And there's no annoys. You call these combined numerical and Catholic in different ways. So I think the best way to work out what sort of graph you're going to plot these to look at examples and see what you think worked well. What you don't think worked. And you sort of build up an intuition there. But this, this express the Beta pretty clear to me. So by way of a fast tell us then we know Guides page break, I'm used to this. We're going to stay here in the main room, but you can see how I'm doing. The first task is looking at how we can decide which sort of pop to use. So for the first one, we're looking at the correlation. The correlation between height and effect size measure somebody's height. What is the circumference of that? I hate saying you should in your notebook have or another of these Mentimeter votes. So you can see you've got stacked bar graph run in block scatterplot. Choose the one that you think is correct. And then we should see the results will stop to update live hopefully. I hope. Interesting. For example, top to a2. So there's low speed, but for some reason there's no showy bear with me. So we're getting some results. There are some reason is no update to somebody. That's better practise. Okay, So I'm, most people have gone ahead. Few paragraphs here, corresponding parts. So this is a question of when you're looking at Point, which is a numerical variable, you don't have categories of finite, you have an infinite range of numbers there. So that is a continuous value size. You properly also on, you can tape it, expand it because it is just the difference of your head. It's not like shoe sizes. You don't go up in value of integers. We've had sizes. So I would agree with the majority of the class there that actually askance. I've Nazis the way forward. Violin plots stacked bar graph. I think they're better when you've got a categorical 11. Okay, so next one. Same idea. You're looking now, the link between your annual income and the type of Pepe you. So how would you display any correlation? Is that sorry, I should have the correct result and I think it's loading. Hopefully you're all voting. Okay, it'll go heatmap with a violin plots. Violin plots. The whole thing is a click away. I don't know. Scatterplots there is cleaning know. A fair bit like somebody compensation on the whole trace at this point. Case a violin plot is not know. He was coming back saying there's only 70 of us in this room. We've only got about 30 of you voting. Sorry that somebody is not fair vote or somebody broken. Is anyone having problems using the Mentimeter? Usually Red Cross of, you know, I can't. So as fate map few people violent, but I think most of you have correctly said that it's not a scatter plot. A scatter plot is designed for numerical values. Clearly type of paint is not a numerical value. You can't have a quantity of dog. I don't think anyone's around data. Oh, so scatterplots, probably not ideal. A heatmap versus Violin plot. That's an interesting one. So heatmaps our effectively a way of doing a three-dimensional scatter plot. Say you have your data or bedsides, a combination of a three-dimensional, two-dimensional scatterplot plus a third dimension histogram. So you can think of it as you've got your two-dimensional stands, probably put your points and then you truly squares on top of it and work out how many points each is. One way to do a heap, you can do a heatmap with three different variables that aren't related to home today as well, which we'll see. But here again, heatmap is designed for numerical data. So you've got a continuous array on the x axis, continues on one axis, and then a continuous variation in colour on the z axis. A third dimension. So I would agree again with majority of the class that violin plots the way to go. So for each of these violin shape, that would be a category. So this first one on the left could be something like an iguana, one in the middle. You can't get the giraffe. Probably more people have a draught. And then within each of those categories is you've got this continuous histogram of annual income. So you will find that the annual income is roughly sort of distribution of peaks around the middle. So next session you will see what's called the Gaussian distribution. And that is what most people are thinking come to pretend. I'm sorry. It just is a way of summarising across all of the people with that particular animal type for bearing with me, I suppose thinking about it more, the one with the highest income as a mean is probably going to be the more expensive. So maybe giraffe would be this middle one. What I'm going to draw probably has quite a lot money. I'm digressing. Let's move on. So third one, the variation of pH or a reaction with time. So if you have an octree reaction, but never the pH metre adaptations, how would you expect at pH to bury as time goes on? So here we've got a line plot, scatter plots. We've got what's called a box plot. Line, plot, line blah, blah, blah, Everyone's to get nine points aims. I think we have a runaway winner at this point. And I would agree with that. Yes. Your pH is a continuous value, tiny numerical value. So you could argue that you could do a scatter plot or you could argue that you didn't line up? I would say that because you would expect the pH to vary continuously with time. So if you measure a certain time and you measure five milliseconds afterwards, anytime between those two, the pH is going to be intermediate. It's very unlikely to jump up, jump. And so a line plot there makes sense. Whereas a scatter plot as if you're just looking for a correlation isn't a continuous and a stamps probably be better. I should just talk about a boxplot. A boxplot is a bit like a violin plot in that it shows you the distribution of data within categories. So you've got again your categories or cat, dog, fish, and the lines and the whiskers show you why your median and outlying points. So we'll cover this more in the next session when we look at statistics. But it's a way of summarising the distribution without having to make assumptions about how late carbonate. Okay. I finished in the last one. So this is jerks stability. So if you measure some sort of drought at various temperatures of branches. So you could put it in an autoclave and see how long it is before you just grade. And how would you go about showing that field but having the same Probably not there yet, just needed to rewrite it. So we've got a lot of contour plots. We've got a lot of Stat Plots. One person says PyCharm. This one's quite evenly balanced. Actually. I think people are voting seems to have slowed down. I I would agree with all of you that he's not appoint you. You can't really express much as a pie chart in terms of contour plot versus stack problem. Now, this is interesting aspects. So if you imagine you've got a two-dimensional that you're thinking about. So you've got temperature and you've got pressure ends at each point in that temperature pressure diagram, your drug molecule lost for a certain time. So time is you another continuous variable? And I would argue that a stat plot doesn't show those three dimensions. A Snap Plus rank official in two-dimensions and showing hearing. So it jumps between them. It's a bit like a line plot, but you assume that it's discontinuous. I would argue actually a column 2 applies at a contour plot is a bit like a heatmap that we Sure, Surette, Yeah, It's like you've got an X States effect. So this would be the temperature along the x-axis, pressure along the y-axis. And then the colour refers to how long it lasts. So how long your drug molecule, so those conditions. And so in some places, probably low temperature, low pressure, It's going to survive for a long time. As you go to high temperature or high pressure, you'd expect it stopped Football false. So you'd see some sort of change the surface. Whether you would choose a contour plot or a heatmap. That's a fairly subtle point, which I guess going back to what we were talking about with the pH velocities times pH versus time. We said it's a continuous function. So as you're changing time, you expect the pH to be continuous, whereas something else might be more discrete. The contour plot versus heatmap is the same sort of contour plot is continuous. You assume that these contours have a smooth shape, whereas a heatmap doesn't have to heatmap. You've just got a beam that is one value, doesn't necessarily have to correlate with the bin next. So did anyone have any questions? Just before we move on? Is everyone happy? Is anyone confused? Give me a green takeaway, Red Cross, how are you finding things? I say thumbs up. I see a lot of people don't responding. You're concentrating on something else or just asleep. I don't know. Nobody stopped Red Cross. So I think we'll be okay. So we'll carry on them. That was how you go about choosing the type of crop you want to date. And that, that's a very general about employees redundant as a software, you're going to do it. So the next step is obviously teaching programme with Python in this course. So we want to teach you how to plot points. And there's various competitive bidding, Python that will do this. One we're going to talk about lib because it's the most well-developed, but do feel free to explore the other ones. There's some very interesting plotting libraries out there, which can do all sorts of very fancy. But matplotlib is sort of a basic go to that will give you a decent policy problem without too much effort. It's also very widely supported, which is a benefit. And this is very owns. They've got lots of help, but it's frequently updated. And so in terms of using that plot, maybe you have to import I pitch. And this is where it gets a little bit. Because rather than just saying import matplotlib, you have this weird line here which is import matplotlib pyplot, as this is sort of LK can be, just seems to have become a standard that we import this PLT composite of matplotlib, undo most things with that. So that's the thing that you need to make it work. The second line is something specific to Jupiter. Just to make sure your crops up here with it. And I love popping up in a separate window or sometimes not even popping up. So if you put this at the top, you're not guaranteed that my partner up. And so I'm just going to grab a terminology with them. So it's a little bit confusing, but hopefully it will start to come together in your head. So maybe probably worked in terms of figures. Figure is your thing. You can be artists, ambers. So that's bad. If you think that is everything from matplotlib figure is everything visual with a map of them is contained within a figure object. That's a bounding box that has everything in within that. You can have one or more axes is the thing which actually doing your project. So it can be something like static block or that could be your contour plot that we showed earlier. These are all contained within a single axis. Once I saw a single thing. But within a figure and you can have more than one access, sequence it open. And we'll see this later on this morning. And then these objects has an x, has to exist. Yet they have a tongue twister. So you have an x-axis, Bullying axes. And so you can edit things to do with your axes. You can edit things to deal with your taxes. It's important to know which one you're talking about. When you try and do something. Don't try and create a plot on an axis and don't change the label. I hope hopefully as you see, it's all good this morning. If you don't have any problems just south of the demonstrators and try and set your eye. So I just wanted to briefly digress about the matplotlib interface. And this is a real problem if you're searching things on the Internet. So from stack overflow, try and find out that you can just two different ways of doing things. And I both are equally valid. The only thing I would say is don't big spoon is if you start mixing things nowhere. So one approach that you can do with matplotlib is this sidewalk with object oriented approach where you make a figure to begin with. Then you insert these into the figure. And finally you do, you're plotting on the axes that you just created. And that will pop up with a figure, pop up with the axes, and then add the data afterwards. The other way you can do it is the source, so to speak. There's a prompt. And that somehow magically generates the thing that generates the Nazis. And does everything look once? And this extends on the how you spell. So in this session I'm going to concentrate on the object oriented one because I find it much more logical and easier to edit things afterwards. So if you make a plot and then decide, oh, I actually want to change the font of my x label. Then it's easier to do it. Whereas here it gets a bit complicated depending on what situation. The reason this latter one exists is that matplotlib was originally designed as a, as in effect, a free place or MATLAB. So if anyone has ever use MATLAB, it's protein approach is this. So at the bottom. So just, just beware, this doesn't matter which one you use, as long as you only use one at a time. Don't try and mix them because it gets very confusing. So if we're focusing on the first version, then the sequence they don't want to actually make the plot is that you Craig or figure first of all. And as that put your axes within the Fagan, put your danger onto the axes, then do any sort of tweaking and customizations you want to say you can have text or other shapes you can imagine. You can change fonts. You can do almost anything. Bids infinitely customisable. And then the last thing is that you'll probably want to save your plot profile. You don't have to. If you're using a Jupyter Notebook, it's contained there any minus 80 after saying to an external file. But if you're using Python to generate a graph or a Labrador, say, you'd want to then save your resulting graph to something like a PNG file, put it into a document. Okay, so we've talked about making which probably want to choose. We've talked back when you actually get matplotlib up and running, something I just wanted to briefly touch upon is the effect that lamp. So this is something that most people don't even think about. Something like Exxon origin. You plot your data and you come out of the graph. But why is the graph back shape? It could be very long, favourite could be tall and narrow. This is completely your choice. And I'd just like to get you to think about why you might want to choose a different shape or form. So just by way of an example, we're going to plot some data on the atmospheric CO2 that so here you just encode go through and it will using pandas, which we've seen before. Kinda read, this guy's not CSV file is going to read a table file to that general format. And this is just showing you the actually pandas is really very flexible. It doesn't have to be a file that sits on your computer. So here we're actually pointing at this American website, which list, which updates CO2 levels, I think weekly. And you can get the most recent version from the website. There are just way, way he's reinforcing that is reading the data incorrectly. And then once we do that, we're just changing some of the dates. Dates. We didn't code could be a bit tricky. So we're using this datetime object, which I think holiday showed you before. If we run that cell, you say it goes through, reflects the data, read it in, and you've got just the top of the data file here. So you've got a, you've got a month, you've got the average CO2 for that month, the uncertainty of it, the gradual trend with time, and then create it again. So that's just imported the data. The next thing is that we've got to try and plot it. So what we'll do is I'll show you how to go about creating this bigger. So first thing I said is the want to create the figure object. So maybe figure. Then we're going to do adding an access to this. So there's various ways you can do it. But the way that I'm going to teach you is that you can just do a fig. Don't add subplot. And that's quite convenient method that just puts in and access into your finger. So then we want to plot this data was called CH4 dance. I'm going to use the same format that you saw in session 2, which is you the Canvas due date. So we can do actually plotting the DataFrame. So x equals the data coming from variation with time. Y is going to be probably average concentration. The other thing you can do with handedness can actually specify which x's such that x is defective. In this case, we give it the variable name. So slightly cooler. Mostly going to turn off the legend because otherwise, pandas answer legend by default, which is pointless if you're just plotting one thing I wanted to do is change this figure layer. So I will start off with something that's long, as wide and short. And the format here is that you have a fixed size argument and the fig size is the size of your image in inches. Sliding scale it in terms of the width and height. So if we run that cell on its own, you say that it spits our plot, which is eight inches wide, four it used to because of the MIMO screens like he's stating it to some size. But hopefully you can see now, it doesn't look very pretty at the moment. We've got a curve, yes, Boltzmann dates and unlabeled axis. All right, so we'll do is go through and add some details to set our x label. First of all. So you see the syntax here is the object dot set underscore x label, and then some string to define what you want. So we'll coolant on date starting today. And then we'll also set it away lame as well. So that's the same syntax. Looks and y label. And we're going to call this one average CH4 subtraction. And it turns out it's in the middle. And the data file, whether you separate your units with a division of signed or in round brackets, is software-based or style. I think separating it might make sense because they will actually numerically in the new units make sense. So run that again. That should be next one. You say is now updated the date. It's updated, the y-axis label. And that's that's okay. I don't really like this one. It is this average CH4, just a plain text file I really like is for that forward to the subscripted buh-bye chemical formula. The way to do this is using what's called lay tech forward and say, briefly saw this in session. One other thing in terms of how you can put Madison's remote document. The way you do it within matplotlib is very similar. You have dollar signs wrapping around anything you want to be. In this case, we'll just do it with the number. And if we want to have something as a subscript, we do school. So run that again. You say no subscripted. If that's a bit small on your own. If you wanted to superscript to instead use the carrot symbol. And then rebel give you a superscript is full. Here was that subscripted? So that's all figured out. Now we've played around with it. I'm fairly happy with that. So we can say if the figure, so to actually save it, you want to save the figure is so cool, fairly simple, it's just the same thing. And then give it a filename. So in this case, we'll put it in the Images folder and we'll call it landscape CH4 dot PNG. I note that whatever file type you give it, it will try and save it as that font. So you can do PDF files, you can do SVG farmers, you know, all sorts of file types. Matplotlib is quite clever, but say whatever five. Let's say if you run that, it doesn't tell you that it saved the thing about, trust me. Yes, it will be JPEG. I would encourage you not to use JPEG. We won't go into the details of this. But for graphs, because you have monks and straight lines, you want to use something like a PNG or tiff format. Jpegs, better photos, because it doesn't matter if things are a bit blurred. So I'd always stick with B and J. Sometimes on landscape plot if we do another one now, so I'm not going to write it again. I will copy that, paste that there. So now instead of having a figure size of eight by four, Let's go the other way, right folder. And choose rather than pulling that band spectacle, that great. This one knowledge very, very tiny, very, very long. But the labels still the same and you see that it's adjusted everything to suit. So now we've got the two graphs. We can go back to what we're talking about originally, or why would you choose one over the other? So I've just got here, so prepared. And so you can see that actually the land state one is very good for emphasising the monthly variations. Conceivably, little bit ticking along as you go from summer to winter, and that's fairly consistent. The thing you don't get from that is the overall trend. You can see these dark, dark really appreciate it's going up above. Whereas I would argue that actually the portray version, you can really see that that's ramping up quickly. So you can see that CO2 concentrations shoot 0. So depending on what you're trying to assign, you might choose one or the other. So if you were going to a politician, say CO2 is increasing, this is a big deal. But he's on time between 2006, then. I would use something like that because it really demonstrates that because he's dying quickly. Whereas if you want to just go to a climate scientist and say, Look at my information I found without this annual variation in CO2, you might use something like the top. So I'm just seeing the chance if I edit and rerun the candle will be image that's been saved. We replace it with the same name. It will replace it. So matplotlib just not only do you, do you want to replace this file, it will just overwrite it. And that's true of most things. Isaac is, if you say write this file, okay? And I don't really get it, right or something. So if you've got something you want to save, be careful a file read anything to repeat. Okay, so that's just a brief touch on why you would choose different block shapes. So just bear in mind when you keep them. So now we'll talk about how to plot, which seemed like a very silly to say. But there are different ways of doing so well and that we showed you just now. And it's actually two, was to use the pandas methods. So df dot plot it, DataFrame dot plot, your type of thought and your x and y columns. And that's great for really quickly exploring data. So if you have something in a DataFrame is just an example. What is the correlation between this problem and this problem? That's great. The problem is that it has a lot of default formatting. You might not like. So by default, the axes labels are set, the column names, which if the economy is rubbish, then you end up with rubber tables. So the other way is to plot directly. So this is rotten day, a DataFrame Doctorow. The axes objects themselves have a dot-plot method. And so doing something like this, where it's just the key lemma. Okay, so we're going to load in some data from a file and then we're going to do some plotting them. So what we've got is in the data sources folder, capstone. We have three data from underground ethane dihedral. We built an IIFE energy stay at ethane CCT. So what these files I will say this is effectively a two-dimensional array where you've got in one dimension, the rotation around your central only anything ethane. So this is, as you spin one myth, prominent relative to the other. What is the angle? The dihedral angle. So that's what's stored in here. The other dimension is this, the distance between the two carbons in urethane. And then the third file is just what is the energy of the molecule at this point. So hopefully you know what I'm saying, this rotation. I'm older people and how the energy goes up and down. As the client, as you start to interfere with each other, all this information is contained within it. So I'm just going to mode the mean. You use a NumPy. See this much more in the next sessions. Numpy has the text and copy that energy to stay. Cc just equals two equal signs. So that's loaded the data that we can just have a look at, say the IV dihedral. The first dance, it's got 12 rows and 72 is format. So my name is 12 different carbon-carbon distance is 72 different angles between those data. So that's that one. Now we want to try and plot it. So we're going to do the same thing we did before. Create our figure. We're going to add our access. Right? My fingers are really big. And then now because we don't have a DataFrame, we can't do the dot plot that we can pull the stomach. Here we want to tell us that x, y and some sort of format integral. So x theta here we're going to use, we'll use the angles cyclical, some of the dihedral. And we do a single row. Let's take Grove thighs. So this is for a given carbon-carbon distance. We're looking at the different angles. Our data is going to be the energy. I mean, obviously not the same growth in the energy. And then the last thing is this formatting string. So let's do it without HTML. So if we just run that, you see that they generate about lahars. So by default, Thanks. Dot plot will produce annoying but with some coloured line and energies on the site. But the angle x-axis, the energies on the y-axis. If you want a little bit more control over them and you can start to format. So we can use what's called a markup. So there is no more for that if we want to give it circles it just by mouth or not a lowercase o. You now see that each of these points has been assigned during therapy, can control things like that. If we want to change the colour. We can use also in your note card and a lot of different length patient. But how you can choose colours, you can use something simple like just the letter R will give you the red curve. The green curve. Remember that K will give you a backup. K times CMYK printing colours. And then the last thing you can do is change the line style. So the default one is just a single line. But say you wanted to make dash 29 synthetic division. Or you can change it to code on many gives you find matches that there's almost infinite variation reminds you can make your own lifestyle. I will just make this a lot, a lot more brief. So rather than having to define mtcars, Albert Einstein, matplotlib, we'll also take a single expression which gives the colour. So in this case we'll say m for magenta, the markets, right? So in this case, that's triangles pointing upwards. And then finally the line style to a dashed one. So if we run that bracket, you say it's done it all in one. Okay, Chen PLT is not defined. Use the Gaussian for rum from bankrupt in imple, people don't look. So. So you say that we probably have is detected this straightaway, so it's magenta. We need triangles pointing outwards and we want the dashed line has done that all on his own in quite a brief way. Those point powerful. If you just want a simple plot, you can use this expression. If you want something more complicated, you can specify them piece by piece. And then obviously we can save the figure if you wanted to. So sometimes what you really want to do is actually zoom in on your block. So the previous one we were plotting from minus 70 to plus 770 degrees. So that's almost a full rotation around that carbon-carbon bond. Perhaps we just want you to look at a bit between 0 and 100 percent, just sleep one oscillation. So then we can actually control it, uses the axis again. So we've got similar things. We have the full figure subplot, plot, curve with Black, very small dots and a line. And then do that, it will give you the normal figure that we saw before. If we want to change the x range, we do stop sect. And that's from a minimum to the maximum x. So in this case we said we're going to go from 0 to a 120. So if we run that now, you say updates itself to give it some extra spaces. So now we're just passing through 0. So you can just change, your name, could change or wanting them as well. In this case, it doesn't really make much sense. You could go from millilitres to minus seven. And we'll have the effect of compressing everything. Just stays at that point. But you can control things and the thing that you, at least within a Jupiter notebook, matplotlib doesn't have, is an interactive and you can just click and drag away having some other programmes. But there is a way of giving you a link given the notes. Shopping depicts a favourite band. Okay, so, um, we've talked through this already. I'm jumping around a bit, but you can change things around as you want. So the important thing to note is that x dot plot. It doesn't thought I'm buying. It actually just show me your data points by a straight section. But as long as you have enough points, it doesn't break up. And as we saw earlier, you copy, paste that. And then so now we're going to do a proper equals 0 again. And you see that it does. What I said before. Say you're adjusting your style. And there are a huge range of things. So if you want to square it because I'm running out of other suggestions down with triangles. They, you can even have hexagon. If you bring out avoid using lexical is because they start with a little bit of wax. Hey, you can play around with these and see what you want if you're doing multiple plots on the same axes. And he's really helpful to change the milk and blindness. I went through them again, got links that you can see the default Einstein hours. You can define them either with this sort of section. Actually there's a string, or think they are named as Bell's palsy. And colours. This is a huge list. There are an infinite range of colours that you can palpate. The thing I did want to just briefly touch on. So I said about colour being something like offer rate. So the default one would be something like that. I just wanted to say. Matplotlib also defines the default. You'll see these eight actually is it defines all of its faults as the letter capital C followed by a number. So I see no C9. And C10 is this sort of bluey colour that you get by default. C1 is an orangey colour, and so on. It's like seventh grade. I'm painting. He's actually been chosen so that they are quaint, distinct from one another. And if you keep practising on each other. So actually we do something like X dot plot, the fifth row and then also do the seventh row. I0 default, matplotlib. If you, if you don't tended to colour, it will up to date to the next column. So the first voltage with the C naught, the second one was C1. If you did it again, and we're going to see too subtle to see C1, C2. And it will change the colours as you want. Or you can continue controlling that. It's entirely up to you. Like I said, matplotlib is almost infinitely flexible and you can do. So. We've talked about some labelling already. We said about setting x label, setting the y label. You can also do things like centre type. I'm not sure you ever really need to. Most graphs you would have a pension. But you can do it. You can send other text on your front as well. And each of these things, you can customise the font. However, you will say, I really nice one that's quite useful is changing signs. So you couldn't say, I'm on, my leg was bigger. Or you can change the font. Change the font colour. Again, it's just, just look at the matplotlib documentation, which is actually really good. And it will tell you all the different things that you can do with text. And just lay tech again. So we talked about this briefly before painting between dollar signs to interpret as Maths. But there's a lot of other things it can do. So you can have full mathematical equations if you really want to do. But the thing that is really useful for is actually getting Greek characters such as age. But if you want to have a, say an alpha, then you just use a backward slash. And then the main characters, there's alpha between these dollar signs and that will put your breakdown into Australia. Australia, Nepal. You can also distinguish between an uppercase and lowercase by changing that to a character a. So you have this, you have say. One thing just to be aware of in late tech formatting is that bind to full text. This is atomics. So pretty nice mathematical style. If you want to change that to be using an upright style, you can wrap it in a mess. Roman footprint. Let's go back a bit. So say I wanted to stop set x label and we'll call it angle in degrees. So if you want a degree symbol within latex, and we get a superscript it to make it appear high-end text. So if we ramp all of that in dollar signs, you'll see that you get an angle such that degree sign. Let's make that a bit. Food sometimes it was 20, huge. But you can see that your angle is slanted, so it's in italics. If you want to avoid that, you have to use a different type of fog is called the problem of violence or math. Rm stands for Roman. And if you wrap it in curly brackets, anything within those curly brackets should appear as these permanent foam. So if I run that again, is eight, now the angle is upright and you circle is a very well done. I'm also going to add this gets a bit complicated now, but yeah, you see that the space is, there's been a bit strange. So we've lost stone spaces around the division sign. This is because by default, they tech, WE doesn't accept spaces is a valid thing within that circle, where it can. If you want to enforce a space, you have to use a backslash before. Say Hey, we've got a backslash space that just has a single space character. Backslash space here, there's a single space character. So if I run that again, you now see a bit more spacing around the angle. That gets me in-depth. Degree works for the circle. That's a good question. I think it won't default. So degree I think is defined by different package been lake tank. So I'm latex is basically a programming language itself and we can try it. Oh, okay. Yes, it does. Sometimes these are built-in. Sometimes they'll say it's degree will be there and superscript. It is slightly different sacrilege. I then somebody today active. Hopefully everyone's getting something. Thank you, patient. Okay. So that was like tech formatting. The last thing I just love textbook. They shouldn't, but I wanted to talk about adding a legend. It's just an API. So if you want to add a legend, so say you've plotted more than one curve on your graph and you want to label them somehow. The best thing to do is to add an agent. And the location of base can be defined in various ways, normally numerically, or you can use bird. So this is just showing where the position is. Plot. And just to give you an example of how this will work. So if we go back to the FA again, so now we're going to pray, I'll figure that access. Go. And then we're going to plot dihedral. Actually. I'm just going to put in the notes, Let's do the other way around. So rather than plotting the dihedral will probably come distance. And now we want to take the other dimension of our data. So rather than looking at haemoglobin, that way we're going to look up a column in our data. Don't worry too much about this. We talked about Aristotle through this in great detail. Next section, where you can learn how to use NumPy arrays, do clever thinking. But for now we will just plot the energy as a function of problem carbon distance for a given distance as I for a given angle. And we're going to add some formatting here. So let's do that first. Let's add a different one. So different than them. And so if we just simply, you have pretty good here, we've got carbon-carbon distance along the x-axis, energy along the y-axis. And these two curves for two different dihedral angles. If we want to add a bit more information, we can start to add this into the plot. So we could just keep adding this all on one line. And I'm going to add quite a lot of things at a profit. And this is just a point where you can actually within cool to a function, put the commands on separate lines as long as it ends with a comma, and that will nicely separated and you're indenting is roughly aligned. It will generally work. It can run into issues, but normally if you just line up, it works. So we'll set it as a marker. We will give it a lie. I have a dashed and make it seem kind of enlightenment blue. Just a commoner. So it's important when you put them on separate lines, you have to have chromosome three. And then finally, because we want to create a major new, we need to have some text. So actually, the legend to do this you use the label will come up when you do the top. So in this case with the dihedral angle, which rare at which these distances carrying say here. This is 122 degrees. He's an angle between a hydrogen, carbon, carbon, nitrogen. And obviously 122 is not a particularly good label. So maybe we want to call that phi convention from dihedral angle. And we can give it some text saying the dihedral between the H, the H. So I gave use the subscript failure. And now, because I want to subscript lobster parent is once I wrap them all in curly brackets. And that keeps everything together as a group. And we'll give it an equal sign and then get these pieces. Suggestion of degree. Thank you. Found the dollar sign. So that's the label. So the first one a little bit into that. You don't have to for the last one. And then we'll do the same for the second one. So now for these two, let's do some copying and pasting. Because that works well. This time we're going to go for something distinct mascot for triangle. Let's go for a solid line this time. We'll probably want to see this orange thing. We're going to use our FY20 own. This time it is 62 girls. Say then the last thing with this is just to add the legend. So if we want to add the legend to an axis, we do that stop agent. And it is as simple as that, that will generate legend in whatever position we want AT. Well, so by default it tries to find the best position based on the data. If you want to give it a specific position, you can say, look, we want a legend in the upper left pole. So if we now run the batch, Hopefully you can just about see that. So now we've plotted these two curves using dash lines and solid lines, the colours we've said, and we've got a legend automatically generated. It's given the right Nine style divine figure, the right symbols and using the formatting of the SEC. So he's actually a reasonably complicated talked for only a few lines of code. Took a while because I was typing out line by line, but you soon get quicker class. I'll show that and I should probably lay mine exceeds. Always label your axes. So set x label. Since the distance. And here we wanted to have an angstrom is something that comes up a lot in chemistry. If you want to have angstroms within lay tech, you could just do a backslash, a capital letter. And that's a very quick way to give you an angstrom. Those of you who are very eagerly will notice that it's actually an Italian shrimp. If you want it to be upright. Again, we use our death workmen trick, which will force it to the particular. Actually, I find that it's so hopefully you're seeing this is quite an iterative process. So you produce a simple plot that looks okay. I'm going to make it better. And then you slowly iterate through it and adjust things until you have it, just how you want to. And then once you've got it, how you want it, you save it. And it may feel a little bit time consuming compared to something like origin or Excel to achieve the same result. But the real power of this TMS that once you've done it, once you do it again and again and again. So regardless of what they wanted to plot and have exemptions to do it, you could just run the same code again, substituting some new DataFrame. And you get this. You get props that are visually similar. Whereas in Excel origin, that's much harder to achieve. A Buddhist label my y-axis. And this is energy in whole thing. How could it be under the bridge? Doesn't really matter. Now again. And then obviously I wanted to say that I would save that figure and do something useful with it. Okay. I realised that was a lot of stuff. We're going to break very soon. You're talking around, but those are a breakout room shortly. But first, I just wanted to make a brief comment about making plots accessible. So this is something that many of you may not have thought about, but actually it's quite important thing that chemistry is very visual. Display. We will not using lots of girls and we like using colour and even in the lab, lots of things are colourful. This completely neglect speak with especially bad or no. So approximately one in 20 people who have colorblindness. So 66 hours in this class. So I would expect the app free of you who have colorblindness, some for the asking you to say if you do it, it's just interesting to think about the fact that if you're plotting and grows and that many people can interpret as a bit of a problem. So try and think about this a bit when you're plotting any data plane, when you're producing anything with a new degree of, is this really accessible to people who can't see as well as I can. Okay, Thanks. The same as I can. So as a general rule in Groves, avoid using colour just to be the only source of information. It's like the previous problem I did. I didn't just use colour, which the creditor is a combination of colour and the APA style at that point. And so even if you printed that in black and white, you would still be able to distinguish one is quite important to avoid using colour as the only thing. If you've have to. You can also play around with having bulk is just an outline. I want middle. And there's an example in the notebook how you do that. So this is letter. And then, yeah, The adding things like legends is very important. Decent labels and just trying to make your, your proximal testable thing to really avoid this site with colorblindness is avoiding using red and green at the same time. So that's the most common colorblindness. People prompts, which red from green shades, but it's not the only one. So I suggest having a look at some of the links in the notebook. And there's even a good colorblindness simulator where if you make, if you make an image, you can upload it and it will show you what it looks like. Two different people. And then just the other thing about accessibility was including decent, decent alternative text. So that's all well and good having a figure on your report. But if somebody can't see it, so say somebody blind and they can't see the figure. They have no way of my patients. So at the bare minimum, figures particularly easily should have what's called alternative text. So when things got really bad, the argument is that shifts within the square brackets have a description of what the law says and then the MVC itself. So depending on how well somebody is viewing your document, they can still get something meaningful from it. And you can take this. You could do the same thing in PowerPoint Word. If you right-click the fakery through and find a box alternative text. And that's a very good thanks. That makes sense. Okay. So I realised that's been an hour and a half of me talking and showing you things. So it's definitely time for you to do something else. And we'll do a little while for the task and then also have a bit of a break as well. We'll come back and little r. So the task I want you to do is to take this acid-base titration data. First of all, we are using Canvas. Csv file also read CSV, and it gives you the variation of pH with the amount of titrant added for different constants are different acid-base combinations. Doesn't matter what they are. It's just three different sets of data that I want you to read the main, plot them on a single axes. So you just see a single representation rule on it. I don't want you to think about using different colours, different Einstein's and remembers a label your axes and the legend. Things that make you play around with how quickly you can make them. Just test, for example. So we're almost at half past ten, I suggest, is probably not going to take that long, to be honest. Maybe if we give you 10 minutes to actually do the plot. So I'll take you to almost 2211 and then she will have 10 minutes for break as well. So meet back at, let's say 10 to 11. And it gives you quite a long time, but I really think this has been quite an intense hour and a half. So have 20 minutes to just play around with things. Demonstrators will be around for the first ten minutes to help. Okay. Does anyone have any questions? No. In that case, if Lucy, you can open the breakout rooms, please. Have a go any groups that went through these. Could you pause the recording, please? Somebody. Right. Welcome back everyone. Hopefully you all had a nice break there. Have a chance to go through the task. And either I say the, if you've never use Python for processing before, you can spend a while just trying to get basic plot. Hopefully they've managed to work through that. If you spend ages, particularly over the past hour of many years, that and hopefully looking at the advanced tasks gave you something to think about. So I'm just going to go through the arms so briefly. But Like last week, I'm going to put up the notes from I'm going to put up the answers after this session. So don't feel that you have to write all this down. Case, hopefully close it again. So the first thing you need to do is to read in the data firm. So you've got titration pd, read CSV, something. They didn't, couldn't find the data file. If you don't have if you don't realise it's in the data sources folder, then it will come up with an error. So I recommend just have a look, have a browse on the fire, on the folder level. So if you, you, they're all no data files in the same notebook. So you have to be the guy to the data sources for the images. So just have a look around and see what files are. Hopefully you can work it out. And I also recommend that you look at the file. You go to the data sources and say you wanted to open the acid-base titration. Within groups ANOVA, you can just open a CSV file and you can see that's really useful for just seeing what format is, seeing, how it's divided up and see the headings. You can see that all the values separated by commas. So that helps you when you try and read in. So just in terms of the reading, you can see I've also set the index column, the first column in the data file. You don't have to do that, but it's just a way of living things up a bit. So then in terms of the actual plotting, what we did before, it's a faker. Create the axis, axes. And then on that X-Y-Z, your plotting dot plot, and we need x values. So in this case, a titrate, the volume and the y values, which is in this case PHA. You can give it some sort of line style, some sort of colour. Give it a label so that you can then add a legend later. And so you just do that three times for each of the three columns. Set the x label, set the y label at the legend. So if I run that, you end up with a plot that looks a bit like this. Hopefully, it doesn't really matter which colour, which colours you use the which lines values. But hopefully, this idea that you can plot multiple things on top of each other. You can generate a legend and you can label things are countable. The important thing that I didn't say earlier is that your X data and your Y data have to be. So you need the same number of X coordinates as point. Otherwise it will show something. Just Get in there that looks like it says the size mismatch. That probably means you're using the wrong signs. List will be wrong size. Okay, so moving on then, I would like to talk about advanced telescope. I'm happy to chat about them. That's another point. So the next bit I just wanted to talk through different types of plots I've shown you have some basic processing idea grows and there's a few types of product come up again and again as being very useful. So I just wanted to summarise how you use the most important. But I want to emphasise that this is not all of them. So please do look at the matplotlib documentation. I got some really helpful cheat sheets. I put it as a huge number of things you can do. I'm just going to show you the basic ones. So we've seen before that you can use a dot plot to do a line plot or scatter plot. Let's turn the light off. Just produced an empty line, then you just pick markers. If you do know marketer's interest that alignment. So that's really useful for making a simple scatterplot. You'll also may have noticed that matplotlib has a stop Stata function. And so it's actually a dotplot. Say don't use it all the time. But it's really useful if you want to. Or maybe some plots that you cheque your signs, it will point depending on a third dimension that ain't. So as an example, we can put some melting point data. Csv file, nothing points. If I run that, we can just talk. Head will print out the top five lines. So you see the name of the compound, its chemical formula. And also this formula dictionary. A dictionary where the keys are elements and the values on the number of that element. And then from that, I've gone through and calculate it. The ratio of the number of hydrogens, the number of confidence, and the number of oxygen to the number of carbons. And then finally, you've got your melting point in degrees Celsius. So that's the vector on the right in. We want to do something interesting with it. So let's make a figure first. And let's access repeatedly. And then what we can do is do a scatter plot, x dot squared x theta. I was going to be the chair of C ratio. Data is equal to that. Oxygen, the condemnation. And now we're also going to have this colour command, subbands MP, melting point in degrees. So three datasets are your x values, y values, and as they come up. And so we could just run that and it would produce us. A nice plot by this base is not very pretty. It's not, doesn't really tell us the information is going to go through and add some labels to this circle. Centre, right? The HMC ratio. Set our y label to the prompt pleasure. Great papers getting stuck. And then the other thing that is really important to do is to actually try and show what the colours mean. So just having colours on their own is not very informative. So to do that, map up, they couldn't generate colour palettes. Somebody by some methods within a public by default, if you want to do it manually, you need to save the output from this. They use that as an input. So what we'll do is we'll save that. Comes. You will find that they have some monks. Things are spelled with an American colours. I think would say even sceptical, that tells us what colours you've used for individual points. And then when we want to generate a colour ball, we put that into the figure with the idea being that it was squishy to mix, to do colorbar, policy, BMP, static mound, tell it which means it relates to so as it relates to the entities that just bought it. And then we can also give it an angle, say melting point, and give it to the degree symbol and Celsius. So if we now run that, it should go through the problem for everything together so nicely because our axes labels as we had before. But it's now added this really nice colour ball. So it shows you that the blue colours are very low melting point. Thank you to the green, yellow and covers the high multipoint methods. So that's a nice way of doing it. You can also, if you don't want to put colour, you can use S for the summing these. So here we've got some negative numbers which clinical problems that you can see that something is, I'm just going to scale with these values. So the bigger points and the higher melting points, the smaller points have the low melting point. A. It's a bit clunky to be honest, I prefer colour. But there are some cases where above is useful. And those are interested essentially called a Venn diagram. So if you zoom in really small, you can start to see these weird patterns when a diagram, when you've got a series of compounds with related hydrogen, oxygen carbon ratios. So this was developed. A couple people in the department use it looking at NMR data. So that's how you can colour things very easily. The other thing on top and bottom plots. So we've talked before about comparing categories, new, preparing different things that aren't numerical. So let's think about these melting point data. Perhaps you want to think in terms of elements. So we have this, these compounds each with a formula dictionary. So he's got however many carbons, how many nitrogens, how many phosphorus? Perhaps we want to look at how common each of the elements in our data. So revenue is now converted to the function that will take the chemical formula dictionary and convert that into a frequency for each element across the entire dataset. So we'll just save that and then run through generating it. So let's show them how to use an element TOEFL. We're just going to run that function. I just defined on all the datasets for the dictionary piece of it. So if we run that and print out totals. You can say that we've come up with a dictionary where for each element you have the total number of amphetamine within the day. Does x. So go first, sorry, 320000 problems, 37, 28 thousand nitrogens and so on and so on, one leg, so one compound as 11th letter. So then we might want to plot it as a bar chart. So again, figure out, think of pebbles that in my life. And then, now rather than using plot, what we're going to call. So this is as same form and format. He's got the x positions in which you want to pick your vows. It's the y values for Egypt. A slightly clever, if you don't give it a numerical X position, if you just give it a category, it will work out some positions for. So here we're going to use the element tuples keys. So that's going to be that carbon nitrogen oxygen vector. And then the YE thing, the y values are going to be the element titles, top values. So those are the numbers. And then for completeness, give it a label. And why navel? In this case, just so how many of each of them. So if we run that, you see we get these nice thing about John, where it's come through, it's pulled out all the elements, protect them as a separate bar h and giving you the count of each of them as we go along. So there's a nice way that we can use Paula chunks, just a simply categorical vagueness. But you can do a lot more about them. So you can actually spend hours on top of them. So as an example, it's quite complicated example. And with me and hopefully Overstock make sense. So we've looked at how many elements, how many of each type of animals. But perhaps we want to know how the elements are distributed across the groups or periods of the periodic table. So we say now that carbon and silicon are in some way related. So we want to combine them together. So we want to somehow rearrange that data that we just generated to stack them into a format that tells us which enemy are, which periodic table position. So just to talk you through this code, we've got a dictionary that I'm calling periods. And that's going to contain each are all periods within the periodic table. So we've got those one to seven periods, 17, each of those rows is going to be a list, 18 elements long. So when you get some transition metals, you're looking at 18 across 18 groups. Obviously when you're looking at period one, you've just the hydrogen, helium. It doesn't make sense, but it doesn't matter if we have lots of zeros. So this is a rectangular array. You'll see next session you could do this nice nicely using them by, but here we're doing it with you've seen them all. And then for each of those elements titles we had. So carbon, nitrogen, phosphorus. We're going three, using this Mendeleev package to find some information about them. So Mandalay, I can tell you which period it is, which group it is. You have to subtract one because of Python starting at 0. And then within this dictionary, we're going to go, okay, So the dictionary, we look at the row that we're interested in. We'll look at a group ID and we'll assign that to the count of that ten seconds. Brandon that bridge, then it will display probably 600 seven periods, one through seven. And then within that, we've got 373 thousand hydrogens. We'd go through faith, 320000 carbons and nitrogens, oxygens. Hopefully that will make sense. And then as you get heavier elements, there's fewer of them. But we've now got is two-dimensional dataset. So we can use that to do the plotting evil. So again, these things that rather than type this all out manually productive, I will say create the figure and the axes as you start with. In order to spec the bot was using Matplotlib, you have to define by the bathrooms. So by default, you just take Zara and plots everything going on there. But you can give it as bouncing back. So what we're gonna do is set our standards increments to 0 to begin with. So a Thursday to be plop one stop there. And then for each of these periods. In our dataset, we're going to go through talk about macho using loops. Using the position is 12, 18, so that's corresponding to the group. And then this is going to be the wavelength. So how big should our balance bay? So this is how many hydrogens, how many carbons in it? They said. We're going to give it a label just to make the legend a bit neater. And then we're going to use this command to the bar plot. So here we using stability rights between the first time through this loop is going to be 0. And then we're going to update something rights to add all the values that we just plotted. So you probably first boss, which will go to some points and back to your starting points list. And the next time you go through the loop, it's going to start from acquisition process outputs. And then we get a matrix in a minute. So we'll run that and see what happens here again. So hopefully you can see it's gone through first period. It's protests. There's lots of hydrogen and obviously no ileum. So there's a large blue bar, nothing else. Second period, it goes through all of the carbon, nitrogen, oxygen, etcetera. Then he goes to the third period plots. I guess there's some phosphorous and so on and so on. As you go down, we're going to get down periods. You get fewer and fewer. So it's predominantly carbon nitrogen expected. But you can see that they're actually sitting on top of each other because we've updated the starting height is each time through the loop, it's just building up one on top of the hour. I'm, because you've got so much hydrogen and carbon, they sort of dominate the plot. So one thing you can do is eagle-eyed of you will have noticed I deleted is that you can set a logarithmic scale. So this is a really nice feature. Matplotlib is the change of scale from being linear to being logarithmic just with a single line. If we now plot this again, you'll see what happens. It's changed the y scale. So rather than having linearly spaced tick, tick up in a log scale, it makes you tick marks, but nice. And now the data that was previously dominant, because there were so many, so many hydrogens and carbons. Those are all compressed panel. And the other one is thought to be a little bit more visible. And you can obviously play around with the order in which these, so the smaller bars were at bottom. And baby, babies would appear visually knowledge or even though there's very small numbers. So there's a lot of things you can do with logarithmic scale is to emphasise the day to be interested in or reduced because there's things that are not particularly interesting. The last thing I just want to say, so I glossed over this, the x dot set x dx. This is telling you where you want to position will tick box on the x axis. So here we're given an array from one to 19. So it's from one up to 800 inclusive. At defining, put the editions if you wanted to do it with a step of two. So that's a range where you're just getting 135, et cetera. You see it's now update is the tick marks the only be in those braces. So if you do a problem, you find the tick marks are overlapped on top of each other. You can use that command to spread people out. Okay, so that was pop-ups during these categories. The other thing that's related is histograms. Histograms all, as I said earlier, how many values fall within a given night? And these can be done, but ultimately it is. So if we look at our melting points, again, that's cool. Yeah. So looking at our data again, but it wasn't to say this, we've got the melting points as a sequence of values, saying we just want to plot a histogram of range of melting points. Then we'll do the same thing again. I forgot. And now we're going to pull stop hist. And histogram just takes that they want to turn into a stupor MVP that. And we are interested in the melting point in degrees, say, if we just plotted that, that would give us this sort of thing. So it has some defaults or how many Benji use and it just plots it as a sense movie from, and that's your histogram. Probably the thing you want to do is use more big. This is quite coarse, doesn't really tell you much. You just give it the argument goes as I think. Then you see it. Nice, much smoother dataset. You've got more bins, but the data is better summarised. You'll also have seen that it's spitting out these numbers. So that's because, hey, stop his not only returns a float, it also returns the values that we use to make the plot. So in this case, the first thing is the counts. So the first bin, as for the second bin, has 28 v2, et cetera, et cetera. And the second, this is the position of each of these pins. So the left-hand edge. So you can, you can save these and use them for something else if you will. If you just make it tidy and we'll go through melting point, fixed, all set up. And here we are using is it trimmed again? So this is the count within each bin. You can also do a normalised histogram using the same function, which will then give you the frequency within each bin. Value between 0 and 1. Okay, so histograms are quite powerful, quite complicated, but you can do all sorts of things with them. We're going to tell us late to a, that means in a bit more detail. So this is just saying what I said earlier, that it produces your value so you can actually save them. So you can save counts, your veins, and your xs and ys that was plotted, and then edit and all influence. So that was the most used one dimensional plots. I think the other thing that you might find you need to use at some point is a two-dimensional plot. So this is where you're trying to show some relationship between different datasets. But in effect, you're looking at three-dimensional Beta. So an example is if you take these pants up on colouring it, but all of your scatter plot with each other. And you might want to go to something else. And this is where IM Show and contour come in. So these gotta be compensated. I'm not expecting you to be able to necessarily use the straight off. I just wanted to provide you with information so you have it for the future. So in show tapes, thanks to a two-dimensional grid and then it pops a third value. So in fact you have x, y, and z Theta column TO is designed to actually process this x, y, z danger bit more and produce a plot. But the contour plot we saw relative in Mentimeter, say it actually shows the lines of the same value within the dataset. So a toy you, the FA, an example and the, how we can use these terrible mistake. So going back to the dates that you saw these early, you've got a two-dimensional dataset. One direction is your angle, and the other direction is your C-C bond length. And these are arranged in a grid, so we have the value of your C-C bond length at every point within these two dimensions. And the value of the angle at every point in two-dimensions, and the value of your energy to fight. And so it's quite complicated way to arrange the data because it's in this format. It means we can use it to buy with it. So we just set some default status. And then I'd find that somebody who my banking solution. So these, and then we're going to stop him show. And now we want to pop just the values. We're not telling you about the x or the y is, but we just have a two-dimensional array of values. So if we plot the IV energy, that will tell it that it's just the value of the energy at each point in that grid. And so by default, we'll just run that. And you see it gives a plot that looks like this. But that's not really what we want because now we have for any information about the carbon-carbon distance, we don't know what the angle is. These are literally just kicks off values. So ng-show is designed to display images where you only have a value at each pixel. So we can customise this a bit using some of the options. The one to tell it what is our minimum value on the x axis? What is our maximum? Is this extend intelligent? And this is where these values up here come in. So we're going to go to phi max along the x-axis. And we're going to go from C to C, C max along one axis. And so this will change these numbers here, correspond to one. Another. Really weird thing about him show is that by default, it puts the origin top-left cool, zeros error somewhere. That doesn't make much sense. We're going to change it so the origin is a lower-quality reads that. And then the final thing is the insurance, because it's designed to work with images. Pixels are normally square. So if you look very closely at this, you can see that each of these shaded blocks is, you can change that by using another space, a shared. So in this case it would say aspect is 500. Then we can plot that. And it's now he's into rectangles. You see that it's updated the extents to match our values. And so this is starting to look like a sensible plot. But we haven't actually explain what the colours mean. So again, we're going to come about saving the output of that command. And then they'll come about the colours, the axes that we're interested in. We'll give it a label again of energy. And then for completeness, which probably set the axes label. Not going to do it completely typing. But I think it's probably the wrong way randomly, x is the angle dihedral. And then this is the CDC. Let's try and compress. That sounds like you can actually see the output. So now we've got a figure with the dihedral angle geeks, the SCC distance on the y an energy as a function of fat. So this stuff, what do you expect at any particular distance? You've got your cyclical variation in energy as you're rotating. And what you see is that as you stretch the carbon-carbon on forever about, the energy goes up. As you compress it, partial breast together the energy goes up. So the minimum is somewhere around here where that, that's where you eat a molecule will exist in its equilibrium state, I guess. So that's how you can use immune show. You can do all sorts of things. But we can also say use contour plots, some common sense. Now, what we're using is a similar sort of thing, but rather than just a set of data as the energies and then doing sorts of stretching and skewing to make it make sense. We just pass all three sets. So now we've got the dihedral angle, the carbon-carbon distance, and the energy as our x, y, z grid. We tell it we want to calculate ten, couple of those. And then finally at the end. And lo and behold, it's product the same data. But now rather than having sort of an image of the data, is actually explicitly calculated the contour bands and plotted them all on the same plot. And you see that the energy bars, it actually just show me the lines of the contours. Are they making that list? And you can, if you really want to combine bugs. So you can have him share on the background and they come towards plotted on top of that to try to really get quite complicated going quickly. But he's a nice way of representing 2D image. Okay, so now 20 past 11, so we've got some time for some more tasks. Is. So you've got to now. And then some advanced styles because when, if you really need through quickly. So I want to be quite a while to go through this. The last section is just going to be looking at how you'd make one complex plot. So I'll give you about just maybe a little bit. If you could please open the breakout rooms. And again, I encourage you to talk to each other and work together on this. Right? Welcome back everyone. Hopefully you've had a chance to go through the problems. I'll give you a brief run-through of Beyonce's now with the view. Open up the full answers online. Later today. Hopefully. Hopefully you got a family through interviewing climate is do the tasks. So this first one is adding a column. I guess I wasn't. Yeah, I see one cross that I realised I had to get done in the time that you can keep working on these after the session, that's fine. So the first task that is adding a column to the DataFrame which contains the formula weight of each compound. So you've got this formula, the dictionary, you know how many of each element, and it's just a case of using that information. Given the total mass. So to help you, I've totally irrelevant weights as a dictionary. Yeah. So this is for each element, the symbol, something like carbon. You've got a number. What is its formula way that avoids you having to type them in manually. So it runs that suddenly we defined in the dictionary to actually do this. So write a function that takes the formula dictionary and goes through that looks for each element in turn. So for element informed dictionary, find the weight of that element from a dictionary, just defined and multiply it by how many who at that time there are. And then the y I've done is he's just kind of an overall formula late value that starts off at 0. At each time you look through this, you're adding to that value. Yeah. So by the end of that, you've gone through every element in the formula, the dictionary, you've added the masses of them and you return the final value. And then it's just a case of Go to your melting point data, dataframe and run through that, applying this function to every single value. So you can do this by which is the value of the total transaction terms that are iterating through the row. We're locating the row, you're creating a new column. So this guy's important formula late when you are applying that function to the formula dictionary of that row. So if you do that, that will go through just update the DataFrame for you. It takes a little while, I find it's quite slow. Iterating through because there's a common family, hundreds of rows, is that it does take a little while to work its way down the list. You see it's finally finished. An alternative way that you can do things with this I just wanted to introduce you to is if you're applying a function to every value in a column. So if you're iterating, if you're going to do something like iterating through the column, applying the same function each time. And this gives you this doctor apply function. So here we have a function. We're just going to take the column and apply that function to every value in the column. And this will give the same result. It will return a panda series. Then you can save as a new column. And the benefit that this is actually much faster. So you see that ran in a fraction of a second, well as the previous one to two seconds. So just to raise it in same thing. So now tones too, because now you've got that formula right? I want you to plot a scatter graph and how the melting point depends on the formula weight. It's an interesting chemical question. And shoot melting point depend on how heavy molecule is. Now, find out what to do. So same thing as before, great thing a great your axes. And then we're just going to use a dot plot rather than the raw data. So we've got our formula rates as the x values. The melting point is the y-values. Here we're using a small markup, just about adult with no line style, so there is no line. And the other thing I've introduced this alpha so you can have transparency. So this is a number between 01. One is confusing, I think one is completely opaque, 0 is completely transparent. So if I run this, you can see you get a plot that looks a bit like this. So I'm generally speaking, as your formula weight goes up, your melting point goes up. But there's quite a big spread. It airs. No, no clear correlation. There are also interesting limits on them. I'm not sure I can explain, probably just the limits of the data. So Task 3 then is plotting a histogram of these formula lights. So similar to how we did with the melting point. You want to say, what's the distribution of them? And then just adjusting the x-axis in estimate. So similar things what we did before using x dot hist with our dataset. Give it a sensible number of anions. Here I've chosen a 100, but you could give 50 or something. And then if we do it right now, set the X, then you say that you get a distribution that looks like this. So it's very skewed. You have a lot of compounds that are relatively light, but you do have a tale of compounds that get heavier and heavier. If you just want to focus on this middle region, you can set x limit between 500 and 700 British and then zoom in on the bit interested. So those hopefully things that you, even if you didn't quite finish them in this section, you have an idea of how to power four gets a bit trickier. So here you're trying to plot a histogram. Histogram of molecule is a less than 200 grammes per mole. And one of molecules that are more than 200 brands. And these are going to sit on top of each other. The reason being actually that the sum of these bones is quite important. So if you ignore the molecular mass in tiny, you'd have the same shaped histogram. It's just that now we're subdividing it based on mats. And there's a few ways of doing this, some more advanced than others. Some using external packet is, doesn't really matter how you do it. The way I would suggest is actually compute the his and manually and then just plop them as a PowerShell. So using that same stacked bar chart we saw possible. So the only answer is this. Core competencies are run through the details, but essentially you're computing your histogram using them. I don't histogram, which takes a dataset and some bins that you want to use. And to find those at the top. And then once you've calculated those, you can ebola plot using the beam positions, the counts in each bin, and define your way so the bars don't overlap. And then for the second one, you obviously need to define bed A2 position is, and if you run that, you should get something that looks a bit like this. I would also say over2, you've got the shape of your formula weights as a histogram. But now you can see that the lighter formula weights tend to have lower melting points. The heavier formula, the centre that is shifted slightly to the right. So this is one of those games you can play with this sort of thing. That's quite a tricky problem. So just in the last couple of minutes, I wanted to talk about making multiple plots. So this is more for your own information than anything and things that you can use in lab reports. So often what you'll find is that you want to centre this on the same figure. So in some way related to each other. And matplotlib makes this quite easy. When you do the add subplot. If you just give it as a round brackets with nothing in them, if you want an axes. But if you save less than numbers, it will give you multiple vaccines positions. So this is defined on a grid of numbers, number of columns. And the index is just the position within the grid. So it stops at the top-left what? Dr. Brian. So on the next slide, disappeared. Should be a thing. Hopefully you've got a figure in your book that shows you how these are arranged. So you can have a grid of something like a three by three and you can cross it. You can adjust. You often have to adjust your figure size to compensate. So in this case, we're going to go back to our methane data that we had earlier. We're going to make our figure. This time, we're going to make it a bit bigger and bad. So give it six inches. And we're going to generate two activities with them. So the first one is going to be thick dot add subplot. And we're going to want to rows, so one above the other, one column. So our grid is literally just two axes. And this is going to be the first plot with integrated, so the top one. And then we can do the same thing again to create a second, please. This time one. And now, rather than having the first position, we're going to go from the second position. And you can also make it share the eggs of another. So in this case, if we say share x equals act save that constraint, these two plots to have the same age range. So now we're going to do some plotting. Plot. Ch4. I'm going to put the date and CH4 that the average. So this is what we've taught it before. And we can add some labels to this. Clever is that one April as average on top x one. We're then going to plot the CH4 that again with the date because these two are constrained to have the same exact same length. And they don't have to sense. And we're going to plot the average uncertainty, which was in our original dataframe. Common. Set, the y value. I'm sorry if we now run that and you say that it generates a figure. 8 generates two axes, one above the other, which have the same x range, and then plot the data on them respectively. On the right is because these are recent data, the uncertainty is clay and so what we can do is truncate now. So we'll set the x limit of one of the axes. K, we have to use date time because it's pretty tricky, but I think the dates set up and start our x-axis. First of January 1984 exist. The first January, December 2020. And then we'll also set the y limit on x one. Just to show that strip these off. So no 0.42.1 and run that. Is that set them up to import datetime. Import datetime. So now we've changed that x-axis on one of the plots and it's updated both of them because then they limped. And hopefully this slide you can see you can generate all sorts of different plots with different combinations of grids. I'm aware that time is getting away from us. You can get much more complicated. So you can start to use something called GridSpec. And I'll leave you to read through this one more time ready. But you can use that to change the size of this grid so it doesn't have to be equally spaced. You can start to think one room shorter than the other, one column wide and the other. And you can end up with something quite complicated plots. Let's just copy and paste this to the news. They said again, these pixels by encouraging will go up the demonstrator version. So don't worry that you haven't seen it. You can see that now we've got the similar sort of subplot that we had before by using GridSpec breakups, push things down. So the lower one is much smaller. And this can go to an extreme where if you use something like this, I'm just coming up with presentation. If i so I don't expect you to do this sort of thing, but you can use all sorts of axes and all sorts of different arrangements to generate really very complicated plots. I would challenge anyone to try and do this with something like Excel. That you can have multiple plots, the multiple data sets, histograms and scatter plots. And even down to the extent of a box plot, shows how does the uncertainty in our measurement vary with day of the week, that it was magic. So I realise I've got 12 o'clock. That was just the last bit for your own information. I'm not expecting you to be able to do that so much in this course. And just as a summary, then we'll come at all sorts of different types of plot and some of the, what's the philosophy of prophecy? Why should we do different things? And I just wanted to direct you to the further resources. So matplotlib has to be very, very helpful cheat sheets, which if you're ever stuck, I go to these all the time. You can find normally find your odds of having to do certain things. And if I can just encourage you to fill in the feedback for this session, just to say things like, Well, one thing she didn't like, that would be pretty helpful with improving rest of the course. So I'm just saying that question in the chat Canvas. The line of best fit here to StatCrunch. Now, so we've covered fitting in a later section. But Aristotle is going to cover, I think, and talk about how you can festival, calculate your best bet, and then the product of the camel's back. Okay, So I'm going to stop sharing the screen. Thank you for coming. And if you have any urgent questions, I can answer them now. Otherwise, I will guess we'll see you next week.